WO1999033982A2 - Genes humains et produits d'expression genique i - Google Patents

Genes humains et produits d'expression genique i Download PDF

Info

Publication number
WO1999033982A2
WO1999033982A2 PCT/US1998/027610 US9827610W WO9933982A2 WO 1999033982 A2 WO1999033982 A2 WO 1999033982A2 US 9827610 W US9827610 W US 9827610W WO 9933982 A2 WO9933982 A2 WO 9933982A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
polynucleotide
protein
cell
gene
Prior art date
Application number
PCT/US1998/027610
Other languages
English (en)
Other versions
WO1999033982A3 (fr
Inventor
Lewis T. Williams
Jaime Escobedo
Michael A. Innis
Pablo Dominguez Garcia
Julie Sudduth-Klinger
Christoph Reinhard
Klaus Giese
Filippo Randazzo
Giulia C. Kennedy
David Pot
Altaf Kassam
George Lamson
Radoje Drmanac
Radomir Crkvenjakov
Mark Dickson
Snezana Drmanac
Ivan Labat
Dena Leshkowitz
David Kita
Veronica Garcia
Lee William Jones
Birgit Stache-Crain
Original Assignee
Chiron Corporation
Hyseq Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chiron Corporation, Hyseq Inc. filed Critical Chiron Corporation
Priority to JP2000526638A priority Critical patent/JP2002500010A/ja
Priority to EP98965500A priority patent/EP1190058A2/fr
Priority to AU20955/99A priority patent/AU2095599A/en
Publication of WO1999033982A2 publication Critical patent/WO1999033982A2/fr
Publication of WO1999033982A3 publication Critical patent/WO1999033982A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development

Definitions

  • the present invention relates to novel polynucleotides, particularly to novel polynucleotides of human origin that are expressed in a selected cell type, are differentially expressed in one cell type relative to another cell type (e.g., in cancerous cells, or in cells of a specific tissue origin) and/or share homology to polynucleotides encoding a gene product having an identified functional domain and/or activity.
  • Identification of novel polynucleotides, particularly those that encode an expressed gene product, is important in the advancement of drug discovery, diagnostic technologies, and the understanding of the progression and nature of complex diseases such as cancer. Identification of genes expressed in different cell types isolated from sources that differ in disease state or stage, developmental stage, exposure to various environmental factors, the tissue of origin, the species from which the tissue was isolated, and the like is key to identifying the genetic factors that are responsible for the phenotypes associated with these various differences
  • This invention provides novel human polynucleotides, the polypeptides encoded by these polynucleotides, and the genes and proteins corresponding to these novel polynucleotides.
  • the present invention features a library of polynucleotides, the library comprising the sequence information of at least one of SEQ ID NOS: 1-844.
  • the invention features a library provided on a nucleic acid array, or in a computer-readable format.
  • the library is comprises a differentially expressed polynucleotide comprising a sequence selected from the group consisting of SEQ ID NOS:9, 39, 42, 52, 62. 74, 119, 172, 317, and 379.
  • the library comprises: 1) a polynucleotide that is differentially expressed in a human breast cancer cell, where the polynucleotide comprises a sequence selected from the group consisting of SEQ ID NOS: 4, 9, 39, 42, 52, 62, 65, 66, 68, 74, 81, 114, 123, 144, 130, 157, 162, 172, 178, 183, 202, 214, 219, 223, 258, 298, 317, 338, 379, 384, 386, and 388; 2) a polynucleotide differentially expressed in a human colon cancer cell, where the polynucleotide comprises a sequence selected from the group consisting of SEQ ID NOS: 1, 39, 52, 97, 119, 134, 172, 176, 241, 288, 317, 357, 362, and 374; or 3) a polynucleotide differentially expressed in a human lung cancer cell, where the polynucle
  • the invention features an isolated polynucleotide comprising a nucleotide sequence having at least 90% sequence identity to an identifying sequence of SEQ ID NOS: 1 -844 or a degenerate variant thereof.
  • the invention features recombinant host cells and vectors comprising the polynucleotides of the invention, as well as isolated polypeptides encoded by the polynucleotides of the invention and antibodies that specifically bind such polypeptides.
  • the invention features an isolated polynucleotide comprising a sequence encoding a polypeptide of a protein family selected from the group consisting of: 4 transmembrane segments integral membrane proteins, 7 transmembrane receptors,
  • ATPases associated with various cellular activities AAA
  • eukaryotic aspartyl proteases GATA family of transcription factors
  • G-protein alpha subunit G-protein alpha subunit
  • phorbol esters/diacylglycerol binding proteins protein kinase, protein phosphatase 2C, protein tyrosine phosphatase, trypsin, wnt family of developmental signaling proteins, and WW/rsp5/WWP domain containing proteins.
  • the invention features a polynucleotide comprising a sequence of one of SEQ ID NOS: 24, 41, 101, 157, 291, 305, 315, 341, 63, 116, 134, 136, 151, 384, 404, 308, 213, 367, 188, 251, 202, 315, 367, 397, 256, 382, 169, 23, 291, 324, 330, 341, 353, 188, 379 , and 395.
  • the invention features a polynucleotide comprising a sequence encoding a polypeptide having a functional domain selected from the group consisting of: Ank repeat, basic region plus leucine zipper transcription factors, bromodomain, EF-hand, SH3 domain, WD domain/G-beta repeats, zinc finger (C2H2 type), zinc finger (CCHC class), and zinc-binding metalloprotease domain.
  • the invention features a polynucleotide comprising a sequence of one of SEQ ID NOS: 116, 251, 374, 97, 136, 242, 379, 306, 386, 18, 335, 61, 306, 386, 322, 306, and 395.
  • the invention features a method of detecting differentially expressed genes correlated with a cancerous state of a mammalian cell, where the method comprises the step of detecting at least one differentially expressed gene product in a test sample derived from a cell suspected of being cancerous, where the gene product is encoded by a gene corresponding to a sequence of at least one of SEQ ID NOS:4, 9, 39, 42, 52, 62, 65, 66, 68, 74, 81, 114, 123, 144, 130, 157, 162, 172, 178, 183, 202, 214, 219, 223, 258, 298, 317, 338, 379, 384, 386, 388, 1, 39, 52, 97, 119, 134, 172, 176, 241, 288, 317, 357, 362, 374, 9, 34, 42, 62, 74, 106, 119, 135, 154, 160, 260, 308, 323, 349, 361, 369, 371,
  • Detection of the differentially expressed gene product is correlated with a cancerous state of the cell from which the test sample was derived.
  • the detecting is by hybridization of the test sample to a reference array, wherein the reference array comprises an identifying sequence of at least one of SEQ ID NOS: 1-844.
  • the cell is a breast tissue derived cell
  • the differentially expressed gene product is encoded by a gene corresponding to a sequence of at least one of SEQ ID NOS: 4, 9, 39, 42, 52, 62, 65, 66, 68, 74, 81, 114, 123, 144, 130, 157, 162, 172, 178, 183, 202, 214, 219, 223, 258, 298, 317, 338, 379, 384, 386, and 388.
  • the cell is a colon tissue derived cell, and differentially expressed gene product is encoded by a gene corresponding to a sequence of at least one of SEQ ID NOS: 1, 39, 52, 97, 1 19, 134, 172, 176, 241, 288, 317, 357, 362, and 374.
  • the cell is a lung tissue derived cell
  • differentially expressed gene product is encoded by a gene corresponding to a sequence of at least one of SEQ ID NOS: 9, 34, 42, 62, 74, 106, 119, 135, 154, 160, 260, 308, 323, 349, 361, 369, 371, 379, 395, 381, and 400.
  • the invention relates to polynucleotides comprising the disclosed nucleotide sequences, to full length cDNA, mRNA and genes corresponding to these sequences, and to polypeptides and proteins encoded by these polynucleotides and genes.
  • polynucleotides that encode polypeptides and proteins encoded by the polynucleotides of the Sequence Listing are also included.
  • the various polynucleotides that can encode these polypeptides and proteins differ because of the degeneracy of the genetic code, in that most amino acids are encoded by more than one triplet codon. The identity of such codons is well-known in this art, and this information can be used for the construction of the polynucleotides within the scope of the invention.
  • Polynucleotides encoding polypeptides and proteins that are variants of the polypeptides and proteins encoded by the polynucleotides and related cDNA and genes are also within the scope of the invention.
  • the variants differ from wild type protein in having one or more amino acid substitutions that either enhance, add, or diminish a biological activity of the wild type protein. Once the amino acid change is selected, a polynucleotide encoding that variant is constructed according to the invention.
  • polynucleotide compositions encompassed by the invention methods for obtaining cDNA or genomic DNA encoding a full-length gene product, expression of these polynucleotides and genes, identification of structural motifs of the polynucleotides and genes, identification of the function of a gene product encoded by a gene corresponding to a polynucleotide of the invention, use of the provided polynucleotides as probes and in mapping and in tissue profiling, use of the corresponding polypeptides and other gene products to raise antibodies, and use of the polynucleotides and their encoded gene products for therapeutic and diagnostic purposes.
  • polynucleotide compositions includes, but is not necessarily limited to, polynucleotides having a sequence set forth in any one of SEQ ID NOS: 1 -844; polynucleotides obtained from the biological materials described herein or other biological sources (particularly human sources) by hybridization under stringent conditions (particularly conditions of high stringency); genes corresponding to the provided polynucleotides; variants of the provided polynucleotides and their corresponding genes, particularly those variants that retain a biological activity of the encoded gene product (e.g., a biological activity ascribed to a gene product corresponding to the provided polynucleotides as a result of the assignment of the gene product to a protein family(ies) and/or identification of a functional domain present in the gene product).
  • polynucleotides having a sequence set forth in any one of SEQ ID NOS: 1 -844 polynucleotides obtained from the biological materials described herein or other biological sources (particularly human sources) by hybridization under
  • nucleic acid compositions contemplated by and within the scope of the present invention will be readily apparent to one of ordinary skill in the art when provided with the disclosure here.
  • the invention features polynucleotides that are expressed in cells of human tissue, specifically human colon, breast, and/or lung tissue.
  • Novel nucleic acid compositions of the invention of particular interest comprise a sequence set forth in any one of SEQ ID NOS:l- 844 or an identifying sequence thereof.
  • An "identifying sequence" is a contiguous sequence of residues at least about 10 nt to about 20 nt in length, usually at least about 50 nt to about 100 nt in length, that uniquely identifies a polynucleotide sequence, e.g.
  • the subject novel nucleic acid compositions include full length cDNAs or mRNAs that encompass an identifying sequence of contiguous nucleotides from any one of SEQ ID NOS: 1-844.
  • the polynucleotides of the invention also include polynucleotides having sequence similarity or sequence identity. Nucleic acids having sequence similarity are detected by hybridization under low stringency conditions, for example, at 50°C and 10XSSC (0.9 M saline/0.09 M sodium citrate) and remain bound when subjected to washing at 55°C in lXSSC.
  • Sequence identity can be determined by hybridization under stringent conditions, for example, at 50°C or higher and 0.1XSSC (9 mM saline/0.9 mM sodium citrate). Hybridization methods and conditions are well known in the art, see, e.g., U.S. Patent No. 5,707,829. Nucleic acids that are substantially identical to the provided polynucleotide sequences, e.g. allelic variants, genetically altered versions of the gene, etc., bind to the provided polynucleotide sequences (SEQ ID NOS: 1-844) under stringent hybridization conditions. By using probes, particularly labeled probes of DNA sequences, one can isolate homologous or related genes.
  • the source of homologous genes can be any species, e.g. primate species, particularly human; rodents, such as rats and mice, canines, felines, bovines, ovines, equines, yeast, nematodes, etc.
  • hybridization is performed using at least 15 contiguous nucleotides of at least one of SEQ ID NOS: 1-844. That is, when at least 15 contiguous nucleotides of one of the disclosed SEQ ID NOs. is used as a probe, the probe will preferentially hybridize with a gene or mRNA (of the biological material) comprising the complementary sequence, allowing the identification and retrieval of the nucleic acids of the biological material that uniquely hybridize to the selected probe. Probes from more than one SEQ ID NO. will hybridize with the same gene or mRNA if the cDNA from which they were derived corresponds to one mRNA. Probes of more than 15 nucleotides can be used, but 15 nucleotides represents enough sequence for unique identification.
  • the polynucleotides of the invention also include naturally occurring variants of the nucleotide sequences (e.g., degenerate variants, allelic variants, etc.). Variants of the polynucleotides of the invention are identified by hybridization of putative variants with nucleotide sequences disclosed herein, preferably by hybridization under stringent conditions For example, by using appropriate wash conditions, variants of the polynucleotides of the invention can be identified where the allelic variant exhibits at most about 25-30% base pair mismatches relative to the selected polynucleotide probe.
  • allelic variants contain 15-25% base pair mismatches, and can contain as little as even 5-15%, or 2-5%, or 1-2% base pair mismatches, as well as a single base-pair mismatch.
  • the invention also encompasses homologs corresponding to the polynucleotides of
  • SEQ ID NOS: 1-844 where the source of homologous genes can be any mammalian species, e.g., primate species, particularly human; rodents, such as rats, canines, felines, bovines, ovines, equines, yeast, nematodes, etc. Between mammalian species, e.g., human and mouse, homologs have substantial sequence similarity, e.g., at least 75% sequence identity, usually at least 90%, more usually at least 95% between nucleotide sequences. Sequence similarity is calculated based on a reference sequence, which may be a subset of a larger sequence, such as a conserved motif, coding region, flanking region, etc.
  • a reference sequence will usually be at least about 18 contiguous nt long, more usually at least about 30 nt long, and may extend to the complete sequence that is being compared.
  • Algorithms for sequence analysis are known in the art, such as BLAST, described in Altschul et al., J. Mol. Biol. (1990) 275:403-10.
  • variants of the invention have a sequence identity greater than at least about 65%, preferably at least about 75%, more preferably at least about 85%, and can be greater than at least about 90% or more as determined by the Smith- Waterman homology search algorithm as implemented in MPSRCH program (Oxford Molecular).
  • a preferred method of calculating percent identity is the Smith- Waterman algorithm, using the following.
  • Global DNA sequence identity must be greater than 65% as determined by the Smith- Waterman homology search algorithm as implemented in MPSRCH program (Oxford Molecular) using an affine gap search with the following search parameters: gap open penalty, 12; and gap extension penalty, 1.
  • the subject nucleic acids can be cDNAs or genomic DNAs, as well as fragments thereof, particularly fragments that encode a biologically active gene product and/or are useful in the methods disclosed herein (e.g., in diagnosis, as a unique identifier of a differentially expressed gene of interest, etc.).
  • cDNA as used herein is intended to include all nucleic acids that share the arrangement of sequence elements found in native mature mRNA species, where sequence elements are exons and 3 and 5 non-coding regions. Normally mRNA species have contiguous exons, with the intervening introns, when present, being removed by nuclear RNA splicing, to create a continuous open reading frame encoding a polypeptide of the invention.
  • a genomic sequence of interest comprises the nucleic acid present between the initiation codon and the stop codon, as defined in the listed sequences, including all of the introns that are normally present in a native chromosome. It can further include the 3 and 5 untranslated regions found in the mature mRNA. It can further include specific transcriptional and translational regulatory sequences, such as promoters, enhancers, etc., including about 1 kb, but possibly more, of flanking genomic DNA at either the 5 and 3 end of the transcribed region.
  • the genomic DNA can be isolated as a fragment of 100 kbp or smaller; and substantially free of flanking chromosomal sequence.
  • the genomic DNA flanking the coding region, either 3 and 5 , or internal regulatory sequences as sometimes found in introns contains sequences required for proper tissue, stage-specific, or disease- state specific expression.
  • the nucleic acid compositions of the subject invention can encode all or a part of the subject differentially expressed polypeptides. Double or single stranded fragments can be obtained from the DNA sequence by chemically synthesizing oligonucleotides in accordance with conventional methods, by restriction enzyme digestion, by PCR amplification, etc.
  • Isolated polynucleotides and polynucleotide fragments of the invention comprise at least about 10, about 15, about 20, about 35, about 50, about 100, about 150 to about 200, about 250 to about 300, or about 350 contiguous nucleotides selected from the polynucleotide sequences as shown in SEQ ID NOS: 1-844.
  • fragments will be of at least 15 nt, usually at least 18 nt or 25 nt, and up to at least about 50 contiguous nt in length or more.
  • the polynucleotide molecules comprise a contiguous sequence of at least twelve nucleotides selected from the group consisting of the polynucleotides shown in SEQ ID NOS: 1-844.
  • Probes specific to the polynucleotides of the invention can be generated using the polynucleotide sequences disclosed in SEQ ID NOS: 1-844.
  • the probes are preferably at least about 12, 15, 16, 18, 20, 22, 24, or 25 nucleotide fragment of a corresponding contiguous sequence of SEQ ID NOS: 1-844, and can be less than 2, 1, 0.5, 0.1, or 0.05 kb in length.
  • the probes can be synthesized chemically or can be generated from longer polynucleotides using restriction enzymes.
  • the probes can be labeled, for example, with a radioactive, biotinylated, or fluorescent tag.
  • probes are designed based upon an identifying sequence of a polynucleotide of one of SEQ ID NOS: 1 -844. More preferably, probes are designed based on a contiguous sequence of one of the subject polynucleotides that remain unmasked following application of a masking program for masking low complexity (e.g., XBLAST) to the sequence., i.e., one would select an unmasked region, as indicated by the polynucleotides outside the poly-n stretches of the masked sequence produced by the masking program.
  • a masking program for masking low complexity e.g., XBLAST
  • polynucleotides of the subject invention are isolated and obtained in substantial purity, generally as other than an intact chromosome.
  • the polynucleotides either as DNA or RNA, will be obtained substantially free of other naturally-occurring nucleic acid sequences, generally being at least about 50%, usually at least about 90% pure and are typically "recombinant", e.g. , flanked by one or more nucleotides with which it is not normally associated on a naturally occurring chromosome.
  • the polynucleotides of the invention can be provided as a linear molecule or within a circular molecule. They can be provided within autonomously replicating molecules (vectors) or within molecules without replication sequences. They can be regulated by their own or by other regulatory sequences, as is known in the art.
  • the polynucleotides of the invention can be introduced into suitable host cells using a variety of techniques which are available in the art, such as transferrin polycation-mediated DNA transfer, transfection with naked or encapsulated nucleic acids, liposome-mediated DNA transfer, intracellular transportation of DNA-coated latex beads, protoplast fusion, viral infection, electroporation, gene gun, calcium phosphate-mediated transfection, and the like.
  • the subject nucleic acid compositions can be used to, for example, produce polypeptides, as probes for the detection of mRNA of the invention in biological samples (e.g., extracts of human cells) to generate additional copies of the polynucleotides, to generate ribozymes or antisense oligonucleotides, and as single stranded DNA probes or as triple-strand forming oligonucleotides.
  • the probes described herein can be used to, for example, determine the presence or absence of the polynucleotide sequences as shown in SEQ ID NOS: 1-844 or variants thereof in a sample. These and other uses are described in more detail below. Use of Polynucleotides to Obtain Full-Length cDNA and Full-Length Human Gene and
  • Full-length cDNA molecules comprising the disclosed polynucleotides are obtained as follows.
  • a polynucleotide having a sequence of one of SEQ ID NOS: 1-844, or a portion thereof comprising at least 12, 15, 18, or 20 nucleotides, is used as a hybridization probe to detect hybridizing members of a cDNA library using probe design methods, cloning methods, and clone selection techniques such as those described in U.S. Patent No. 5,654,173.
  • Libraries of cDNA are made from selected tissues, such as normal or tumor tissue, or from tissues of a mammal treated with, for example, a pharmaceutical agent.
  • the tissue is the same as the tissue from which the polynucleotides of the invention were isolated, as both the polynucleotides described herein and the cDNA represent expressed genes.
  • the cDNA library is made from the biological material described herein in the Examples.
  • many cDNA libraries are available commercially. (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., (1989) Cold Spring Harbor Press, Cold Spring Harbor, NY). The choice of cell type for library construction can be made after the identity of the protein encoded by the gene corresponding to the polynucleotide of the invention is known.
  • the libraries are prepared from mRNA of human colon cells, more preferably, human colon cancer cells, even more preferably, from a highly metastatic colon cell, Kml2L4-A.
  • the cDNA can be prepared by using primers based on sequence from SEQ ID NOS: 1-844.
  • the cDNA library can be made from only poly-adenylated mRNA.
  • poly-T primers can be used to prepare cDNA from the mRNA.
  • RNA protection experiments are performed as follows. Hybridization of a full-length cDNA to an mRNA will protect the
  • RNA from RNase degradation If the cDNA is not full length, then the portions of the mRNA that are not hybridized will be subject to RNase degradation. This is assayed, as is known in the art, by changes in electrophoretic mobility on polyacrylamide gels, or by detection of released monoribonucleotides.
  • Genomic DNA is isolated using the provided polynucleotides in a manner similar to the isolation of full-length cDNAs. Briefly, the provided polynucleotides, or portions thereof, are used as probes to libraries of genomic DNA.
  • the library is obtained from the cell type that was used to generate the polynucleotides of the invention, but this is not essential. Most preferably, the genomic DNA is obtained from the biological material described herein in the Examples.
  • Such libraries can be in vectors suitable for carrying large segments of a genome, such as PI or YAC, as described in detail in Sambrook et al., 9.4- 9.30.
  • genomic sequences can be isolated from human BAC libraries, which are commercially available from Research Genetics, Inc., Huntville, Alabama, USA, for example.
  • chromosome walking is performed, as described in Sambrook et al., such that adjacent and overlapping fragments of genomic DNA are isolated. These are mapped and pieced together, as is known in the art, using restriction digestion enzymes and DNA ligase.
  • corresponding full-length genes can be isolated using both classical and PCR methods to construct and probe cDNA libraries.
  • Northern blots preferably, are performed on a number of cell types to determine which cell lines express the gene of interest at the highest level.
  • Classical methods of constructing cDNA libraries are taught in Sambrook et al., supra. With these methods, cDNA can be produced from mRNA and inserted into viral or expression vectors. Typically, libraries of mRNA comprising poly(A) tails can be produced with poly(T) primers. Similarly, cDNA libraries can be produced using the instant sequences as primers.
  • PCR methods are used to amplify the members of a cDNA library that comprise the desired insert.
  • the desired insert will contain sequence from the full length cDNA that corresponds to the instant polynucleotides.
  • Such PCR methods include gene trapping and RACE methods.
  • Gene trapping entails inserting a member of a cDNA library into a vector. The vector then is denatured to produce single stranded molecules. Next, a substrate-bound probe, such a biotinylated oligo, is used to trap cDNA inserts of interest. Biotinylated probes can be linked to an avidin-bound solid substrate.
  • PCR methods can be used to amplify the trapped cDNA. To trap sequences corresponding to the full length genes, the labeled probe sequence is based on the polynucleotide sequences of the invention.
  • Random primers or primers specific to the library vector can be used to amplify the trapped cDNA.
  • Such gene trapping techniques are described in Gruber et al., WO 95/04745 and Gruber et al., U.S. Pat. No. 5,500,356. Kits are commercially available to perform gene trapping experiments from, for example, Life Technologies, Gaithersburg, Maryland, USA.
  • RACE Rapid amplification of cDN A ends
  • a common primer is designed to anneal to an arbitrary adaptor sequence ligated to cDNA ends (Apte and Siebert, Biotechniques (1993) 75:890-893; Edwards et al., Nuc. Acids Res. (1991) 79:5227-5232).
  • a single gene-specific RACE primer is paired with the common primer, preferential amplification of sequences between the single gene specific primer and the common primer occurs.
  • Commercial cDNA pools modified for use in RACE are available.
  • Another PCR-based method generates full-length cDNA library with anchored ends without needing specific knowledge of the cDNA sequence.
  • the method uses lock-docking primers (I-VI), where one primer, poly TV (I-III) locks over the polyA tail of eukaryotic mRNA producing first strand synthesis and a second primer, polyGH (IV -VI) locks onto the polyC tail added by terminal deoxynucleotidyl transferase (TdT).
  • TdT terminal deoxynucleotidyl transferase
  • promoter regions contain the "TATA" box, a sequence such as TATTA or TATAA, which is sensitive to mutations.
  • the promoter region can be obtained by performing 5' RACE using a primer from the coding region of the gene.
  • the cDNA can be used as a probe for the genomic sequence, and the region 5' to the coding region is identified by "walking up.” If the gene is highly expressed or differentially expressed, the promoter from the gene can be of use in a regulatory construct for a heterologous gene.
  • DNA encoding variants can be prepared by site-directed mutagenesis, described in detail in Sambrook et al., 15.3-15.63.
  • the choice of codon or nucleotide to be replaced can be based on disclosure herein on optional changes in amino acids to achieve altered protein structure and/or function.
  • nucleic acid comprising nucleotides having the sequence of one or more polynucleotides of the invention can be synthesized.
  • the invention encompasses nucleic acid molecules ranging in length from 15 nucleotides (corresponding to at least 15 contiguous nucleotides of one of SEQ ID NOS: 1 -844) up to a maximum length suitable for one or more biological manipulations, including replication and expression, of the nucleic acid molecule.
  • the invention includes but is not limited to (a) nucleic acid having the size of a full gene, and comprising at least one of SEQ ID NOS: 1-844; (b) the nucleic acid of (a) also comprising at least one additional gene, operably linked to permit expression of a fusion protein; (c) an expression vector comprising (a) or (b); (d) a plasmid comprising (a) or (b) ; and (e) a recombinant viral particle comprising (a) or (b).
  • construction or preparation of (a) - (e) are well within the skill in the art.
  • sequence of a nucleic acid comprising at least 15 contiguous nucleotides of at least any one of SEQ ID NOS: 1-844, preferably the entire sequence of at least any one of SEQ ID NOS : 1 -844, is not limited and can be any sequence of A, T, G, and/or C (for DNA) and A, U, G, and/or C (for RNA) or modified bases thereof, including inosine and pseudouridine.
  • sequence will depend on the desired function and can be dictated by coding regions desired, the intron-like regions desired, and the regulatory regions desired.
  • nucleic acid obtained is referred to herein as a polynucleotide comprising the sequence of any one of SEQ ID NOS: 1-844.
  • the provided polynucleotide e.g., a polynucleotide having a sequence of one of SEQ ID NOS: 1-844
  • the corresponding cDNA or the full-length gene is used to express a partial or complete gene product.
  • Constructs of polynucleotides having sequences of SEQ ID NOS: 1-844 can be generated synthetically.
  • single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides is described by, e.g., Stemmer et al., Gene (Amsterdam) (1995) 164(l):49-53.
  • assembly PCR the synthesis of long DNA sequences from large numbers of oligodeoxyribonucleotides (oligos) is described.
  • the method is derived from DNA shuffling (Stemmer, Nature (1994) 570:389-391), and does not rely on D ⁇ A ligase, but instead relies on D ⁇ A polymerase to build increasingly longer D ⁇ A fragments during the assembly process.
  • a 1.1 -kb fragment containing the TEM-1 beta-lactamase-encoding gene (bla) can be assembled in a single reaction from a total of 56 oligos, each 40 nucleotides (nt) in length.
  • the synthetic gene can be PCR amplified and cloned in a vector containing the tetracycline-resistance gene (Tc-R) as the sole selectable marker.
  • the gene product encoded by a polynucleotide of the invention is expressed in any expression system, including, for example, bacterial, yeast, insect, amphibian and mammalian systems. Suitable vectors and host cells are described in U.S. Patent No. 5,654,173.
  • Bacteria Expression systems in bacteria include those described in Chang et al., Nature (1978) 275:615; Goeddel et al., Nature (1979) 257:544; Goeddel et al., Nucleic Acids Res. (1980) 5:4057; EP 0 036,776; U.S. Patent No. 4,551,433; DeBoer et al., Proc. Natl. Acad. Sci. (USA) (1983) 50:21-25; and Siebenlist et al., Cell (1980) 20:269.
  • Yeast Expression systems in yeast include those described in Hinnen et al, Proc.
  • Insect Cells Expression of heterologous genes in insects is accomplished as described in U.S. Patent No. 4,745,051; Friesen et al, "The Regulation of Baculovirus Gene Expression", in: The Molecular Biology Of Baculoviruses (1986) (W. Doerfler, ed.); EP 0 127,839; EP 0 155,476; and Vlak et al, J. Gen. Virol. (1988) 69:165-776; Miller et al, Ann. Rev. Microbiol.
  • Mammalian Cells Mammalian expression is accomplished as described in Dijkema et al, EMBO J. (1985) 4:761, Gorman et al, Proc. Natl. Acad. Sci. (USA) (1982) 79:6777, Boshart et al, Cell (1985) 41:52 and U.S. Patent No. 4,399,216. Other features of mammalian expression are facilitated as described in Ham and Wallace, Meth. Enz. (1979) 55:44, Barnes and Sato, ⁇ w ⁇ /. Biochem. (1980) 102:255, U.S. Patent Nos.
  • Polynucleotide molecules comprising a polynucleotide sequence provided herein propagated by placing the molecule in a vector.
  • Viral and non-viral vectors are used, including plasmids. The choice of plasmid will depend on the type of cell in which propagation is desired and the purpose of propagation. Certain vectors are useful for amplifying and making large amounts of the desired DNA sequence. Other vectors are suitable for expression in cells in culture. Still other vectors are suitable for transfer and expression in cells in a whole animal or person.
  • the choice of appropriate vector is well within the skill of the art. Many such vectors are available commercially.
  • the partial or full-length polynucleotide is inserted into a vector typically by means of DNA ligase attachment to a cleaved restriction enzyme site in the vector.
  • the desired nucleotide sequence can be inserted by homologous recombination in vivo. Typically this is accomplished by attaching regions of homology to the vector on the flanks of the desired nucleotide sequence. Regions of homology are added by ligation of oligonucleotides, or by polymerase chain reaction using primers comprising both the region of homology and a portion of the desired nucleotide sequence, for example.
  • polynucleotides set forth in SEQ ID NOS: 1-844 or their corresponding full- length polynucleotides are linked to regulatory sequences as appropriate to obtain the desired expression properties. These can include promoters (attached either at the 5' end of the sense strand or at the 3' end of the antisense strand), enhancers, terminators, operators, repressors, and inducers.
  • the promoters can be regulated or constitutive. In some situations it may be desirable to use conditionally active promoters, such as tissue-specific or developmental stage-specific promoters. These are linked to the desired nucleotide sequence using the techniques described above for linkage to vectors. Any techniques known in the art can be ⁇ used.
  • the resulting replicated nucleic acid, RNA, expressed protein or polypeptide is within the scope of the invention as a product of the host cell or organism.
  • the product is recovered by any appropriate means known in the art.
  • the gene corresponding to a selected polynucleotide is identified, its expression can be regulated in the cell to which the gene is native.
  • an endogenous gene of a cell can be regulated by an exogenous regulatory sequence as disclosed in U.S. Patent No. 5,641,670.
  • sequences that show similarity with a chemokine sequence can exhibit chemokine activities.
  • sequences exhibiting similarity with more than one individual sequence can exhibit activities that are characteristic of either or both individual sequences.
  • the full length sequences and fragments of the polynucleotide sequences of the nearest neighbors can be used as probes and primers to identify and isolate the full length sequence corresponding to provided polynucleotides.
  • the nearest neighbors can indicate a tissue or cell type to be used to construct a library for the full-length sequences corresponding to the provided polynucleotides.
  • a selected polynucleotide is translated in all six frames to determine the best alignment with the individual sequences.
  • the sequences disclosed herein in the Sequence Listing are in a 5' to 3' orientation and translation in three frames can be sufficient (with a few specific exceptions as described in the Examples). These amino acid sequences are referred to, generally, as query sequences, which will be aligned with the individual sequences.
  • Databases with individual sequences are described in "Computer Methods for Macromolecular Sequence Analysis” Methods in Enzymology (1996) 266, Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, California, USA. Databases include Genbank, EMBL, and DNA Database of Japan (DDBJ).
  • Query and individual sequences can be aligned using the methods and computer programs described above, and include BLAST, available over the world wide web at http://ww.ncbi.nlm.nih.gov/BLAST/.
  • Another alignment algorithm is Fasta, available in the Genetics Computing Group (GCG) package, Madison, Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc. Other techniques for alignment are described in Doolittle, supra.
  • GCG Genetics Computing Group
  • an alignment program that permits gaps in the sequence is utilized to align the sequences.
  • the Smith- Waterman is one type of algorithm that permits gaps in sequence alignments. See Meth. Mol. Biol. (1997) 70: 173-187.
  • the GAP program using the Needleman and Wunsch alignment method can be utilized to align sequences.
  • An alternative search strategy uses MPSRCH software, which runs on a MASPAR computer.
  • MPSRCH uses a Smith- Waterman algorithm to score sequences on a massively parallel computer. This approach improves ability to identify sequences that are distantly related matches, and is especially tolerant of small gaps and nucleotide sequence errors.
  • Amino acid sequences encoded by the provided polynucleotides can be used to search both protein and DNA databases.
  • Results of individual and query sequence alignments can be divided into three categories, high similarity, weak similarity, and no similarity. Individual alignment results ranging from high similarity to weak similarity provide a basis for determining polypeptide activity and/or structure. Parameters for categorizing individual results include: percentage of the alignment region length where the strongest alignment is found, percent sequence identity, and p value.
  • the percentage of the alignment region length is calculated by counting the number of residues of the individual sequence found in the region of strongest alignment, e.g., contiguous region of the individual sequence that contains the greatest number of residues that are identical to the residues of the corresponding region of the aligned query sequence. This number is divided by the total residue length of the query sequence to calculate a percentage. For example, a query sequence of 20 amino acid residues might be aligned with a 20 amino acid region of an individual sequence. The individual sequence might be identical to amino acid residues 5, 9-15, and 17-19 of the query sequence. The region of strongest alignment is thus the region stretching from residue 9-19, an 11 amino acid stretch. The percentage of the alignment region length is: 11 (length of the region of strongest alignment) divided by (query sequence length) 20 or 55%.
  • Percent sequence identity is calculated by counting the number of amino acid matches between the query and individual sequence and dividing total number of matches by the number of residues of the individual sequences found in the region of strongest alignment. Thus, the percent identity in the example above would be 10 matches divided by 11 amino acids, or approximately, 90.9% P value is the probability that the alignment was produced by chance.
  • the p value can be calculated according to Karlin et al, Proc. Natl. Acad. Sci. (1990) 57:2264 and Karlin et al, Proc. Natl. Acad. Sci. (1993) 90.
  • the p value of multiple alignments using the same query sequence can be calculated using an heuristic approach described in Altschul et al., Nat. Genet. (1994) 6:1 19. Alignment programs such as BLAST program can calculate the p value.
  • Another factor to consider for determining identity or similarity is the location of the similarity or identity. Strong local alignment can indicate similarity even if the length of alignment is short. Sequence identity scattered throughout the length of the query sequence also can indicate a similarity between the query and profile sequences. The boundaries of the region where the sequences align can be determined according to Doolittle. supra; BLAST or FAST programs; or by determining the area where sequence identity is highest. High Similarity. In general, in alignment results considered to be of high similarity, the percent of the alignment region length is typically at least about 55% of total length query sequence; more typically, at least about 58%; even more typically; at least about 60% of the total residue length of the query sequence.
  • percent length of the alignment region can be as much as about 62%; more usually, as much as about 64%; even more usually, as much as about 66%. Further, for high similarity, the region of alignment, typically, exhibits at least about 75% of sequence identity; more typically, at least about 78%; even more typically; at least about 80% sequence identity. Usually, percent sequence identity can be as much as about 82%; more usually, as much as about 84%; even more usually, as much as about 86%.
  • the p value is used in conjunction with these methods. If high similarity is found, the query sequence is considered to have high similarity with a profile sequence when the p value is less than or equal to about 10 "2 ; more usually; less than or equal to about 10 "3 ; even more usually; less than or equal to about 10 " ". More typically, the p value is no more than about 10 "5 ; more typically; no more than or equal to about 10 "10 ; even more typically; no more than or equal to about 10 "!5 for the query sequence to be considered high similarity.
  • Weak Similarity In general, where alignment results considered to be of weak similarity, there is no minimum percent length of the alignment region nor minimum length of alignment.
  • a better showing of weak similarity is considered when the region of alignment is, typically, at least about 15 amino acid residues in length; more typically, at least about 20; even more typically; at least about 25 amino acid residues in length.
  • length of the alignment region can be as much as about 30 amino acid residues; more usually, as much as about 40; even more usually, as much as about 60 amino acid residues.
  • the region of alignment typically, exhibits at least about 35% of sequence identity; more typically, at least about 40%; even more typically; at least about 45% sequence identity.
  • percent sequence identity can be as much as about 50%; more usually, as much as about 55%; even more usually, as much as about 60%.
  • the query sequence is considered to have weak similarity with a profile sequence when the p value is usually less than or equal to about 10 "2 ; more usually; less than or equal to about 10 "3 ; even more usually; less than or equal to about 10 "4 . More typically, the p value is no more than about 10 "5 ; more usually; no more than or equal to about 10 '10 ; even more usually; no more than or equal to about 10 "15 for the query sequence to be considered weak similarity. Similarity Determined by Sequence Identity Alone. Sequence identity alone can be used to determine similarity of a query sequence to an individual sequence and can indicate the activity of the sequence. Such an alignment, preferably, permits gaps to align sequences.
  • the query sequence is related to the profile sequence if the sequence identity over the entire query sequence is at least about 15%; more typically, at least about 20%; even more typically, at least about 25%; even more typically, at least about 50%. Sequence identity alone as a measure of similarity is most useful when the query sequence is usually, at least 80 residues in length; more usually, 90 residues; even more usually, at least 95 amino acid residues in length. More typically, similarity can be concluded based on sequence identity alone when the query sequence is preferably 100 residues in length; more preferably, 120 residues in length; even more preferably, 150 amino acid residues in length.
  • Translations of the provided polynucleotides can be aligned with amino acid profiles that define either protein families or common motifs. Also, translations of the provided polynucleotides can be aligned to multiple sequence alignments (MSA) comprising the polypeptide sequences of members of protein families or motifs. Similarity or identity with profile sequences or MS As can be used to determine the activity of the gene products (e.g., polypeptides) encoded by the provided polynucleotides or corresponding cDNA or genes.
  • MSA sequence alignments
  • sequences that show an identity or similarity with a chemokine profile or MSA can exhibit chemokine activities.
  • MSAs can designed manually by (1) creating an MSA, which is an alignment of the amino acid sequence of members that belong to the family and (2) constructing a statistical representation of the alignment. Such methods are described, for example, in Birney et al, Nucl. Acid Res. (1996) 24(14): 2730-2739. MSAs of some protein families and motifs are publicly available. For example, http://genome.wustl.edu/Pfam/ includes MSAs of 547 different families and motifs. These MSAs are described also in Sonnhammer et al, Proteins (1997) 25: 405-420.
  • Similarity between a query sequence and a protein family or motif can be determined by (a) comparing the query sequence against the profile and or (b) aligning the query sequence with the members of the family or motif.
  • a program such as Searchwise is used to compare the query sequence to the statistical representation of the multiple alignment, also known as a profile.
  • the program is described in Birney et al, supra.
  • Other techniques to compare the sequence and profile are described in Sonnhammer et al, supra and Doolittle, supra.
  • methods described by Feng et al, J. Mol. Evol. (1987) 25:351 and Higgins et al, CABIOS (1989) 5:151 can be used align the query sequence with the members of a family or motif, also known as a MSA.
  • Computer programs such as PILEUP, can be used. See Feng et al, infra. In general, the following factors are used to determine if a similarity between a query sequence and a profile or MSA exists: (1) number of conserved residues found in the query sequence, (2) percentage of conserved residues found in the query sequence, (3) number of frameshifts, and (4) spacing between conserved residues.
  • Some alignment programs that both translate and align sequences can make any number of frameshifts when translating the nucleotide sequence to produce the best alignment. The fewer frameshifts needed to produce an alignment, the stronger the similarity or identity between the query and profile or MSAs.
  • a weak similarity resulting from no frameshifts can be a better indication of activity or structure of a query sequence, than a strong similarity resulting from two frameshifts.
  • three or fewer frameshifts are found in an alignment; more preferably two or fewer frameshifts; even more preferably, one or fewer frameshifts; even more preferably, no frameshifts are found in an alignment of query and profile or MSAs.
  • conserveed residues are those amino acids found at a particular position in all or some of the family or motif members. For example, most chemokines contain four conserved cysteines. Alternatively, a position is considered conserved if only a certain class of amino acids is found in a particular position in all or some of the family members. For example, the N-terminal position can contain a positively charged amino acid, such as lysine, arginine, or histidine.
  • a residue of a polypeptide is conserved when a class of amino acids or a single amino acid is found at a particular position in at least about 40% of all class members; more typically, at least about 50%; even more typically, at least about 60% of the members.
  • a residue is conserved when a class or single amino acid is found in at least about 70% of the members of a family or motif; more usually, at least about 80%; even more usually, at least about 90%; even more usually, at least about 95%.
  • a residue is considered conserved when three unrelated amino acids are found at a particular position in the some or all of the members; more usually, two unrelated amino acids. These residues are conserved when the unrelated amino acids are found at particular positions in at least about 40% of all class member; more typically, at least about 50%; even more typically, at least about 60% of the members. Usually, a residue is conserved when a class or single amino acid is found in at least about 70% of the members of a family or motif; more usually, at least about 80%; even more usually, at least about 90%; even more usually, at least about 95%.
  • a query sequence has similarity to a profile or MSA when the query sequence comprises at least about 25% of the conserved residues of the profile or MSA; more usually, at least about 30%; even more usually; at least about 40%.
  • the query sequence has a stronger similarity to a profile sequence or MSA when the query sequence comprises at least about 45% of the conserved residues of the profile or MSA; more typically, at least about 50%; even more typically; at least about 55%.
  • Profiles The identify and function of the gene that correlates to a polynucleotide described herein can be determined by screening the polynucleotides or their corresponding amino acid sequences against profiles of protein families. Such profiles focus on common structural motifs among proteins of each family. Publicly available profiles are described above in
  • Chemokines are a family of proteins that have been implicated in lymphocyte trafficking, inflammatory diseases, angiogenesis, hematopoiesis, and viral infection. See, for example, Rollins, Blood (1997) °0(5 :909-928, and Wells et al, J. Leuk.
  • U.S. Patent No. 5,605,817 discloses DNA encoding a chemokine expressed in fetal spleen.
  • U.S. Patent No. 5,656,724 discloses chemokine-like proteins and methods of use.
  • U.S. Patent No. 5,602,008 discloses DNA encoding a chemokine expressed by liver.
  • Chemokine mutants are polypeptides having an amino acid sequence that possesses at least one amino acid substitution, addition, or deletion as compared to native chemokines.
  • the number or type of the amino acid changes is not critical, nor is the length or number of the amino acid deletions, or amino acid extensions that are incorporated in the chemokines as compared to the native chemokine amino acid sequences.
  • a polynucleotide encoding one of these variant polypeptides will retain at least about 80% amino acid identity with at least one known chemokine.
  • these polypeptides will retain at least about 85% amino acid sequence identity, more preferably, at least about 90%; even more preferably, at least about 95%.
  • the variants exhibit at least 80%; preferably about 90%; more preferably about 95% of at least one activity exhibited by a native chemokine, which includes immunological, biological, receptor binding, and signal transduction functions.
  • Chemokines can possess dimerization activity, which can be assayed according to Burrows et al, Biochem. (1994) 55:12741; and Zhang et al, Mol. Cell. Biol. (1995) 75:4851. Native chemokines can play a role in the inflammatory response of viruses. This activity can be assayed as described in Bleul et al, Nature (1996) 552:829; and Oberlin et al, Nature
  • TRADD Tumor Necrosis Factor Receptor- 1 Associated Death Domain containing protein
  • modifications of the active domain of TRADD that retain the functional characteristics of the protein, as well as apoptosis assays for testing the function of such death domain containing proteins.
  • U.S. Patent No. 5,658,883 discloses biologically active TGF-B1 peptides.
  • U.S. Patent No. 5,674,734 discloses RIP, which contains a C- terminal death domain and an N-terminal kinase domain.
  • Leukemia Inhibitory Factor (LIF) Leukemia Inhibitory Factor
  • An LIF profile is constructed from sequences of leukemia inhibitor factor, CT-1 (cardiotrophin-1), CNTF (ciliary neurotrophic factor), OSM (oncostatin M), and IL-6 (interleukin-6).
  • CT-1 cardiac neurotrophin-1
  • CNTF ciliary neurotrophic factor
  • OSM oncostatin M
  • IL-6 interleukin-6
  • This profile encompasses a family of secreted cytokines that have pleiotropic effects on many cell types including hepatocytes, osteoclasts, neuronal cells and cardiac myocytes, and can be used to detect additional genes encoding such proteins.
  • These molecules are all structurally related and share a common co-receptor gpl30 which mediates intracellular signal transduction by cytoplasmic tyrosine kinases such as src.
  • Novel proteins related to this family are also likely to be secreted, to activate gpl30 and to function in the development of a variety of cell types. Thus new members of this family would be candidates to be developed as growth or survival factors for the cell types that they stimulate. For more details on this family of cytokines, see Pennica et al, Cytokine and Growth Factor Reviews (1996) 7:81-91.
  • U.S. Patent No. 5,420,247 discloses LIF receptor and fusion proteins.
  • U.S. Patent No. 5,443,825 discloses human LIF. Angiopoietin.
  • Angiopoietin-1 is a secreted ligand of the TIE-2 tyrosine kinase; it functions as an angiogenic factor critical for normal vascular development.
  • Angiopoietin-2 is a natural antagonist of angiopoietin- 1 and thus functions as an anti-angiogenic factor.
  • These two proteins are structurally similar and activate the same receptor (Folkman et al. , Cell (1996) 57:1153, and Davis et al, Cell (1996) 57:1161).
  • the angiopoietin molecules are composed of two domains: a coiled-coil region and a region related to fibrinogen.
  • the fibrinogen domain is found in many molecules including ficolin and tesascin, and is well defined structurally with many members.
  • Receptor Protein-Tyrosine Kinases Receptor Protein-Tyrosine Kinases. Receptor Protein-Tyrosine Kinases or RPTKs are described in Lindberg, Annu. Rev. Cell Biol. (1994) 70:251-337. Growth Factors: (Epidermal Growth Factor) EGF and (Fibroblast Growth Factor)
  • U.S. Patent No. 4,444,760 discloses acidic brain fibroblast growth factor, which is active in the promotion of cell division and wound healing.
  • U.S. Patent No. 5,439,818 discloses DNA encoding human recombinant basic fibroblast growth factor, which is active in wound healing.
  • U.S. Patent No. 5,604,293 discloses recombinant human basic fibroblast growth factor, which is useful for wound healing.
  • 5,410,832 discloses brain-derived and recombinant acidic fibroblast growth factor, which act as mitogens for mesoderm and neuroectoderm-derived cells in culture, and promote wound healing in soft tissue, cartilaginous tissue and musculo-skeletal tissue.
  • U.S. Patent No. 5,387,673 discloses biologically active fragments of FGF.
  • TNF Family Proteins of the TNF Family.
  • a profile derived from the TNF family is created by aligning sequences of the following TNF family members: nerve growth factor (NGF), lymphotoxin, Fas ligand, tumor necrosis factor (TNF ⁇ ), CD40 ligand, TRAIL, ox40 ligand, 4- IBB ligand, CD27 ligand, and CD30 ligand.
  • the profile is designed to identify sequences of proteins that constitute new members or homologues of this family of proteins.
  • U.S. Patent No. 5,606,023 discloses mutant TNF proteins;
  • U.S. Patent No. 5,597,899 and U.S. Patent No. 5,486,463 disclose TNF muteins; and
  • U.S. Patent No. 5,652,353 discloses DNA encoding TNF ⁇ muteins.
  • TNF family of proteins have been show in vitro to multimerize, as described in Burrows et al, Biochem. (1994) 55:12741 and Zhang et al, Mol. Cell. Biol. (1995) 75:4851 and bind receptors as described in Browning et al, J. Immunol. (1994) 747:1230, Androlewicz et al, J. Biol. Chem.( ⁇ 992) 267:2542, and Crowe et al, Science (1994) 264:707.
  • TNFs proteolytically cleave a target protein as described in Kriegel et al, Cell (1988) 53 :45 and Mohler et al. , Nature ( 1994) 70:218 and demonstrate cell proliferation and differentiation activity.
  • T-cell or thymocyte proliferation is assayed as described in Armitage et al, Eur. J. Immunol (1992) 22:447; Current Protocols in Immunology, ed. J.E. Coligan et al, 3.1-3.19; Takai et al, J. Immunol. (1986) 757:3494- 3500, Bertagnoli et al, J. Immunol.
  • TNFs In vivo activities of TNFs also include lymphocyte survival and apoptosis, assayed as described in Darzynkewicz et al, Cytometry (1992) 75:795; Gorczca et al, Leukemia (1993)
  • TNF proteins include a transmembrane domain. The protein is cleaved into a shorter soluble version, as described in Kriegler et al. , Cell (1988) 55:45, Perez et al. , Cell (1990)
  • the transmembrane domain is between amino acid 46 and 77 and the cytoplasmic domain is between position 1 and 45 on the human form of TNF ⁇ .
  • the 3-dimensional motifs of TNF include a sandwich of two pleated ⁇ sheets.
  • Each sheet is composed of anti-parallel ⁇ strands, ⁇ strands facing each other on opposite sites of the sandwich are connected by short polypeptide loops, as described in Van Ostade et al, Protein Engineering (1994) 7(1):5, and Sprang et al, Tumor Necrosis Factors; supra.
  • TNF receptors are disclosed in U.S. Patent No. 5,395,760.
  • TNF receptor family is created by aligning sequences of the TNF receptor family, including Apol/Fas, TNFR I and II, death receptor 3 (DR3), CD40, ox40, CD27, and CD30.
  • the profile is designed to identify from the polynucleotides of the invention sequences of proteins that constitute new members or homologues of this family of proteins.
  • Tumor necrosis factor receptors exist in two forms in humans: p55 TNFR and p75 TNFR, both of which provide intracellular signals upon binding with a ligand.
  • the extracellular domains of these receptor proteins are cysteine rich.
  • the receptors can remain membrane bound, although some forms of the receptors are cleaved forming soluble receptors.
  • the regulation, diagnostic, prognostic, and therapeutic value of soluble TNF receptors is discussed in Aderka, Cytokine and Growth Factor Reviews, ( 1996) 7(5).231.
  • PDGF Family U.S. Patent No. 5,326,695 discloses platelet derived growth factor agonists; bioactive portions of PDGF-B are used as agonists.
  • 4,845,075 discloses biologically active B-chain homodimers, and also includes variants and derivatives of the PDGF-B chain.
  • U.S. Patent No. 5,128,321 discloses PDGF analogs and methods of use. Proteins having the same bioactivity as PDGF are disclosed, including A and B chain proteins.
  • U.S. Patent No. 5,650,501 discloses serine/threonine kinase, associated with mitotic and meiotic cell division; the protein has a kinase domain in its N-terminal and 3 PEST regions in the C-terminus.
  • U.S. Patent No. 5,605,825 discloses human PAK65, a serine protein kinase. The foregoing discussion provides a few examples of the protein profiles that can be compared with the polynucleotides of the invention. One skilled in the art can use these and other protein profiles to identify the genes that correlate with the provided polynucleotides. C.
  • Both secreted and membrane-bound polypeptides of the present invention are of particular interest. For example, levels of secreted polypeptides can be assayed in body fluids that are convenient, such as blood, urine, prostatic fluid and semen. Membrane-bound polypeptides are useful for constructing vaccine antigens or inducing an immune response. Such antigens would comprise all or part of the extracellular region of the membrane-bound polypeptides. Because both secreted and membrane-bound polypeptides comprise a fragment of contiguous hydrophobic amino acids, hydrophobicity predicting algorithms can be used to identify such polypeptides.
  • a signal sequence is usually encoded by both secreted and membrane-bound polypeptide genes to direct a polypeptide to the surface of the cell.
  • the signal sequence usually comprises a stretch of hydrophobic residues.
  • Such signal sequences can fold into helical structures.
  • Membrane-bound polypeptides typically comprise at least one transmembrane region that possesses a stretch of hydrophobic amino acids that can transverse the membrane. Some transmembrane regions also exhibit a helical structure.
  • Hydrophobic fragments within a polypeptide can be identified by using computer algorithms. Such algorithms include Hopp & Woods ⁇ Proc. Natl. Acad. Sci. USA (1981) 75:3824-3828; Kyte & Doolittle, J. Mol. Biol. (1982) 757: 105-132; and RAOAR algorithm, Degli Esposti et al, Eur. J. Biochem. (1990) 790: 207-219.
  • Another method of identifying secreted and membrane-bound polypeptides is to translate the polynucleotides of the invention in all six frames and determine if at least 8 contiguous hydrophobic amino acids are present. Those translated polypeptides with at least 8; more typically, 10; even more typically, 12 contiguous hydrophobic amino acids are considered to be either a putative secreted or membrane bound polypeptide.
  • Hydrophobic amino acids include alanine, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, threonine, tryptophan, tyrosine, and valine.
  • Ribozymes, antisense constructs, and dominant negative mutants can be used to determine function of the expression product of a gene corresponding to a polynucleotide provided herein. These methods and compositions are particularly useful where the provided novel polynucleotide exhibits no significant or substantial homology to a sequence encoding a gene of known function.
  • Antisense molecules and ribozymes can be constructed from synthetic polynucleotides. Typically, the phosphoramidite method of oligonucleotide synthesis is used. See Beaucage et al, Tet. Lett. (1981) 22:1859 and U.S. Patent No. 4,668,777.
  • RNA oligonucleotides can be synthesized, for example, using RNA phosphoramidites.
  • This method can be performed on an automated synthesizer, such as Applied Biosystems, Models 392 and 394, Foster City, California, USA. See Applied Biosystems User Bulletin 53 and Ogilvie et al, Pure & Applied Chem. (1987) 59:325.
  • Phosphorothioate oligonucleotides can also be synthesized for antisense construction.
  • a sulfurizing reagent such as tetraethylthiruam disulfide (TETD) in acetonitrile can be used to convert the internucleotide cyanoethyl phosphite to the phosphorothioate triester within 15 minutes at room temperature.
  • TETD replaces the iodine reagent, while all other reagents used for standard phosphoramidite chemistry remain the same.
  • Such a synthesis method can be automated using Models 392 and 394 by Applied Biosystems, for example.
  • Oligonucleotides of up to 200 nucleotides can be synthesized, more typically, 100 nucleotides, more typically 50 nucleotides; even more typically 30 to 40 nucleotides. These synthetic fragments can be annealed and ligated together to construct larger fragments. See, for example, Sambrook et al, supra. A. Ribozymes
  • Trans-cleaving catalytic RNAs are RNA molecules possessing endoribonuclease activity. Ribozymes are specifically designed for a particular target, and the target message must contain a specific nucleotide sequence. They are engineered to cleave any RNA species site-specifically in the background of cellular RNA. The cleavage event renders the mRNA unstable and prevents protein expression. Importantly, ribozymes can be used to inhibit expression of a gene of unknown function for the purpose of determining its function in an in vitro or in vivo context, by detecting the phenotypic effect. One commonly used ribozyme motif is the hammerhead, for which the substrate sequence requirements are minimal.
  • Ribozyme cleavage of HIV-I RNA is described in U.S. Patent No. 5,144,019; methods of cleaving RNA using ribozymes is described in U.S.
  • Patent No. 5,116,742 and methods for increasing the specificity of ribozymes are described in U.S. Patent No. 5,225,337 and Koizumi et al, Nucleic Acid Res. (1989) 77:7059. Preparation and use of ribozyme fragments in a hammerhead structure are also described by Koizumi et al, Nucleic Acids Res. (1989) 7:7059. Preparation and use of ribozyme fragments in a hairpin structure are described by Chowrira and Burke, Nucleic Acids Res. (1992) 20:2835. Ribozymes can also be made by rolling transcription as described in Daubendiek and Kool, Nat. Biotechnol. (1997) 15(3):213.
  • the hybridizing region of the ribozyme can be modified or can be prepared as a branched structure as described in Horn and Urdea, Nucleic Acids Res. (1989) 77:6959.
  • the basic structure of the ribozymes can also be chemically altered in ways familiar to those skilled in the art, and chemically synthesized ribozymes can be administered as synthetic oligonucleotide derivatives modified by monomeric units.
  • liposome mediated delivery of ribozymes improves cellular uptake, as described in Birikh et al, Eur. J. Biochem. (1997) 245:1.
  • Ribozymes are designed to specifically bind and cut the corresponding mRNA species. Ribozymes thus provide a means to inhibit the expression of any of the proteins encoded by the disclosed polynucleotides or their full-length genes. The full-length gene need not be known in order to design and use specific inhibitory ribozymes. In the case of a polynucleotide or full-length cDNA of unknown function, ribozymes corresponding to that nucleotide sequence can be tested in vitro for efficacy in cleaving the target transcript. Those ribozymes that effect cleavage in vitro are further tested in vivo.
  • the ribozyme can also be used to generate an animal model for a disease, as described in Birikh et al, supra.
  • An effective ribozyme is used to determine the function of the gene of interest by blocking its transcription and detecting a change in the cell.
  • an effective ribozyme is designed and delivered in a gene therapy for blocking transcription and expression of the gene.
  • ribozymes proceed beginning with knowledge of a portion of the coding sequence of the gene to be inhibited.
  • a partial polynucleotide sequence provides adequate sequence for constructing an effective ribozyme.
  • a target cleavage site is selected in the target sequence, and a ribozyme is constructed based on the 5' and 3' nucleotide sequences that flank the cleavage site.
  • Retroviral vectors are engineered to express monomeric and multimeric hammerhead ribozymes targeting the mRNA of the target coding sequence. These monomeric and multimeric ribozymes are tested in vitro for an ability to cleave the target mRNA.
  • a cell line is stably transduced with the retroviral vectors expressing the ribozymes, and the transduction is confirmed by Northern blot analysis and reverse-transcription polymerase chain reaction (RT-PCR).
  • RT-PCR reverse-transcription polymerase chain reaction
  • the cells are screened for inactivation of the target mRNA by such indicators as reduction of expression of disease markers or reduction of the gene product of the target mRNA.
  • Antisense nucleic acids are designed to specifically bind to RNA, resulting in the formation of RNA-DNA or RNA-RNA hybrids, with an arrest of DNA replication, reverse transcription or messenger RNA translation.
  • Antisense polynucleotides based on a selected polynucleotide sequence can interfere with expression of the corresponding gene.
  • Antisense polynucleotides are typically generated within the cell by expression from antisense constructs that contain the antisense strand as the transcribed strand.
  • Antisense polynucleotides based on the disclosed polynucleotides will bind and/or interfere with the translation of mRNA comprising a sequence complementary to the antisense polynucleotide.
  • the expression products of control cells and cells treated with the antisense construct are compared to detect the protein product of the gene corresponding to the polynucleotide upon which the antisense construct is based.
  • the protein is isolated and identified using routine biochemical methods.
  • Antisense therapy for a variety of cancers is in clinical phase and has been discussed extensively in the literature. Reed reviewed antisense therapy directed at the Bcl-2 gene in tumors; gene transfer-mediated overexpression of Bcl-2 in tumor cell lines conferred resistance to many types of cancer drugs. (Reed, 3. C, N.C.I. (1997) 59:988). The potential for clinical development of antisense inhibitors of ras is discussed by Cowsert, L.M., Anti- Cancer Drug Design (1997) 72:359. Additional important antisense targets include leukemia (Geurtz, A.M., Anti-Cancer Drug Design (1997) 72:341); human C-ref kinase
  • polynucleotides of the invention can be used as additional potential therapeutics.
  • the choice of polynucleotide can be narrowed by first testing them for binding to "hot spot" regions of the genome of cancerous cells. If a polynucleotide is identified as binding to a "hot spot", testing the polynucleotide as an antisense compound in the corresponding cancer cells clearly is warranted.
  • 57(16):3356 disclose that loss of heterozygosity at 16z24.1-q24.2 is significantly associated with metastatic and aggressive behavior of prostate cancer.
  • C. Dominant Negative Mutations As an alternative method for identifying function of the gene corresponding to a polynucleotide disclosed herein, dominant negative mutations are readily generated for corresponding proteins that are active as homomul timers. A mutant polypeptide will interact with wild-type polypeptides (made from the other allele) and form a non-functional multimer. Thus, a mutation is in a substrate-binding domain, a catalytic domain, or a cellular localization domain. Preferably, the mutant polypeptide will be overproduced. Point mutations are made that have such an effect.
  • polypeptides of the invention include those encoded by the disclosed polynucleotides. These polypeptides can also be encoded by nucleic acids that, by virtue of the degeneracy of the genetic code, are not identical in sequence to the disclosed polynucleotides. Thus, the invention includes within its scope a polypeptide encoded by a polynucleotide having the sequence of any one of SEQ ID NOS: 1-844 or a variant thereof.
  • polypeptide refers to both the full length polypeptide encoded by the recited polynucleotide, the polypeptide encoded by the gene represented by the recited polynucleotide, as well as portions or fragments thereof.
  • Polypeptides also includes variants of the naturally occurring proteins, where such variants are homologous or substantially similar to the naturally occurring protein, and can be of an origin of the same or different species as the naturally occurring protein (e.g., human, murine, or some other species that naturally expresses the recited polypeptide, usually a mammalian species).
  • variant polypeptides have a sequence that has at least about 80%, usually at least about 90%, and more usually at least about 98% sequence identity with a differentially expressed polypeptide of the invention, as measured by BLAST using the parameters described above.
  • the variant polypeptides can be naturally or non- naturally glycosylated, i. e. , the polypeptide has a glycosylation pattern that differs from the glycosylation pattern found in the corresponding naturally occurring protein.
  • the invention also encompasses homologs of the disclosed polypeptides (or fragments thereof) where the homologs are isolated from other species, i.e. other animal or plant species, where such homologs, usually mammalian species, e.g.
  • homolog a polypeptide having at least about 35%, usually at least about 40% and more usually at least about 60% amino acid sequence identity a particular differentially expressed protein as identified above, where sequence identity is determined using the BLAST algorithm, with the parameters described supra.
  • the polypeptides of the subject invention are provided in a non-naturally occurring environment, e.g. are separated from their naturally occurring environment.
  • the subject protein is present in a composition that is enriched for the protein as compared to a control.
  • purified polypeptide is provided, where by purified is meant that the protein is present in a composition that is substantially free of non- differentially expressed polypeptides, where by substantially free is meant that less than 90%, usually less than 60% and more usually less than 50% of the composition is made up of non-differentially expressed polypeptides.
  • variants variants of polypeptides include mutants, fragments, and fusions. Mutants can include amino acid substitutions, additions or deletions.
  • amino acid substitutions can be conservative amino acid substitutions or substitutions to eliminate non-essential amino acids, such as to alter a glycosylation site, a phosphorylation site or an acetylation site, or to minimize misfolding by substitution or deletion of one or more cysteine residues that are not necessary for function.
  • Conservative amino acid substitutions are those that preserve the general charge, hydrophobicity/hydrophilicity, and/or steric bulk of the amino acid substituted. For example, substitutions between the following groups are conservative: Gly/Ala, Val/Ile/Leu, Asp/Glu, Lys/ Arg, Asn/Gln, Ser/Cys, Thr, and Phe/Trp/Tyr.
  • Variants can be designed so as to retain biological activity of a particular region of the protein (e.g., a functional domain and/or, where the polypeptide is a member of a protein family, a region associated with a consensus sequence).
  • a particular region of the protein e.g., a functional domain and/or, where the polypeptide is a member of a protein family, a region associated with a consensus sequence.
  • Osawa et al, Biochem. Mol. Int. (1994) 54:1003 discusses the actin binding region of a protein from several different species. The actin binding regions of the these species are considered homologous based on the fact that they have amino acids that fall within "homologous residue groups.” Homologous residues are judged according to the following groups (using single letter amino acid designations): STAG; ILVMF; HRK; DEQN; and FYW.
  • Amino acid residues were classified into one of three groups depending on their polarity: polar (Arg, Lys, His, Gin, Asn, Asp. and Glu); weak polar (Ala, Pro, Gly, Thr, and Ser), and nonpolar (Cys, Val, Met, He, Leu. Phe, Tyr, and Trp). Amino acid replacements during protein evolution were very conservative: 88% and 76% of them in the interior or exterior, respectively, were within the same group of the three. Inter-group replacements are such that weak polar residues are replaced more often by nonpolar residues in the interior and more often by polar residues on the exterior. Additional guidance for production of polypeptide variants is provided in Querol et al, Prot. Eng.
  • Variants also include fragments of the polypeptides disclosed herein, particularly biologically active fragments and/or fragments corresponding to functional domains. Fragments of interest will typically be at least about 10 aa to at least about 15 aa in length, usually at least about 50 aa in length, and can be as long as 300 aa in length or longer, but will usually not exceed about 1000 aa in length, where the fragment will have a stretch of amino acids that is identical to a polypeptide encoded by a polynucleotide having a sequence of any SEQ ID NOS: 1-844, or a homolog thereof.
  • the protein variants described herein are encoded by polynucleotides that are within the scope of the invention.
  • the genetic code can be used to select the appropriate codons to construct the corresponding variants.
  • a library of polynucleotides is a collection of sequence information, which information is provided in either biochemical form (e.g., as a collection of polynucleotide molecules), or in electronic form (e.g., as a collection of polynucleotide sequences stored in a computer-readable form, as in a computer system and/or as part of a computer program).
  • the sequence information of the polynucleotides can be used in a variety of ways, e.g., as a resource for gene discovery, as a representation of sequences expressed in a selected cell type (e.g., cell type markers), and/or as markers of a given disease or disease state.
  • a disease marker is a representation of a gene product that is present in all affected by disease either at an increased or decreased level relative to a normal cell (e.g., a cell of the same or similar type that is not substantially affected by disease).
  • a polynucleotide sequence in a library can be a polynucleotide that represents an mRNA, polypeptide, or other gene product encoded by the polynucleotide, that is either overexpressed or underexpressed in a breast ductal cell affected by cancer relative to a normal (i.e., substantially disease-free) breast cell.
  • the nucleotide sequence information of the library can be embodied in any suitable form, e.g., electronic or biochemical forms.
  • a library of sequence information embodied in electronic form includes an accessible computer data file (or, in biochemical form, a collection of nucleic acid molecules) that contains the representative nucleotide sequences of genes that are differentially expressed (e.g., overexpressed or underexpressed) as between, for example, i) a cancerous cell and a normal cell; ii) a cancerous cell and a dysplastic cell; iii) a cancerous cell and a cell affected by a disease or condition other than cancer; iv) a metastatic cancerous cell and a normal cell and/or non-metastatic cancerous cell; v) a malignant cancerous cell and a non-malignant cancerous cell (or a normal cell) and/or vi) a dysplastic cell relative to a normal cell.
  • Biochemical embodiments of the library include a collection of nucleic acids that have the sequences of the genes in the library, where the nucleic acids can correspond to the entire gene in the library or to a fragment thereof, as described in greater detail below.
  • the polynucleotide libraries of the subject invention include sequence information of a plurality of polynucleotide sequences, where at least one of the polynucleotides has a sequence of any of SEQ ID NOS: 1-844.
  • plurality is meant at least 2, usually at least 3 and can include up to all of SEQ ID NOS: 1-844.
  • the length and number of polynucleotides in the library will vary with the nature of the library, e.g., if the library is an oligonucleotide array, a cDNA array, a computer database of the sequence information, etc.
  • the nucleic acid sequence information can be present in a variety of media.
  • Media refers to a manufacture, other than an isolated nucleic acid molecule, that contains the sequence information of the present invention. Such a manufacture provides the genome sequence or a subset thereof in a form that can be examined by means not directly applicable to the sequence as it exists in a nucleic acid.
  • the nucleotide sequence of the present invention e.g. the nucleic acid sequences of any of the polynucleotides of SEQ ID NOS: 1-844, can be recorded on computer readable media, e.g. any medium that can be read and accessed directly by a computer.
  • Such media include, but are not limited to: magnetic storage media, such as a floppy disc, a hard disc storage medium, and a magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
  • magnetic storage media such as a floppy disc, a hard disc storage medium, and a magnetic tape
  • optical storage media such as CD-ROM
  • electrical storage media such as RAM and ROM
  • hybrids of these categories such as magnetic/optical storage media.
  • electronic versions of the libraries of the invention can be provided in conjunction or connection with other computer-readable information and/or other types of computer-readable files (e.g. , searchable files, executable files, etc, including, but not limited to, for example, search program software, etc.).
  • other computer-readable information e.g. , searchable files, executable files, etc, including, but not limited to, for example, search program software, etc.
  • the information can be accessed for a variety of purposes.
  • Computer software to access sequence information is publicly available.
  • the BLAST Altschul et al, supra.
  • BLAZE Brutlag et al. Comp. Chem. (1993) 17:203
  • search algorithms on a Sybase system can be used identify open reading frames (ORFs) within the genome that contain homology to ORFs from other organisms.
  • a computer-based system refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence information of the present invention.
  • the minimum hardware of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means.
  • CPU central processing unit
  • input means input means
  • output means output means
  • data storage means can comprise any manufacture comprising a recording of the present sequence information as described above, or a memory access means that can access such a manufacture.
  • Search means refers to one or more programs implemented on the computer-based system, to compare a target sequence or target structural motif with the stored sequence information. Search means are used to identify fragments or regions of the genome that match a particular target sequence or target motif.
  • a variety of known algorithms are publicly known and commercially available, e.g. MacPattern (EMBL), BLASTN and BLASTX (NCBI).
  • a "target sequence” can be any DNA or amino acid sequence of six or more nucleotides or two or more amino acids, preferably from about 10 to 100 amino acids or from about 30 to 300 nucleotide residues.
  • target structural motif refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration that is formed upon the folding of the target motif, or on consensus sequences of regulatory or active sites.
  • target motifs include, but arc not limited to, enzyme active sites and signal sequences.
  • Nucleic acid target motifs include, but are not limited to, hairpin structures, promoter sequences and other expression elements such as binding sites for transcription factors.
  • a variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention.
  • One format for an output means ranks fragments of the genome possessing varying degrees of homology to a target sequence or target motif. Such presentation provides a skilled artisan with a ranking of sequences and identifies the degree of sequence similarity contained in the identified fragment.
  • comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the genome.
  • a skilled artisan can readily recognize that any one of the publicly available homology search programs can be used as the search means for the computer based systems of the present invention.
  • the "library” of the invention also encompasses biochemical libraries of the polynucleotides of SEQ ID NOS:l-844, e.g., collections of nucleic acids representing the provided polynucleotides.
  • the biochemical libraries can take a variety of forms, e.g., a solution of cDNAs, a pattern of probe nucleic acids stably associated with a surface of a solid support (i.e., an array) and the like.
  • nucleic acid arrays in which one or more of SEQ ID NOS: 1-844 is represented on the array.
  • array is meant a an article of manufacture that has at least a substrate with at least two distinct nucleic acid targets on one of its surfaces, where the number of distinct nucleic acids can be considerably higher, typically being at least 10 nt, usually at least 20 nt and often at least 25 nt.
  • arrays of the subject invention find use in a variety of applications, including gene expression analysis, drug screening, mutation analysis and the like, as disclosed in the above-listed exemplary patent documents .
  • analogous libraries of polypeptides are also provided, where the where the polypeptides of the library will represent at least a portion of the polypeptides encoded by SEQ ID NOS: 1-844.
  • Polynucleotide probes are used for a variety of purposes, such as chromosome mapping of the polynucleotide and detection of transcription levels. Additional disclosure about preferred regions of the disclosed polynucleotide sequences is found in the Examples.
  • a probe that hybridizes specifically to a polynucleotide disclosed herein should provide a detection signal at least 5-, 10-, or 20-fold higher than the background hybridization provided with other unrelated sequences.
  • Nucleotide probes are used to detect expression of a gene corresponding to the provided polynucleotide.
  • the references describe an example of a sandwich nucleotide hybridization assay. For example, in Northern blots, mRNA is separated electrophoretically and contacted with a probe. A probe is detected as hybridizing to an mRNA species of a particular size. The amount of hybridization is quantitated to determine relative amounts of expression, for example under a particular condition. Probes are also used to detect products of amplification by polymerase chain reaction. The products of the reaction are hybridized to the probe and hybrids are detected. Probes are used for in situ hybridization to cells to detect expression.
  • Probes can also be used in vivo for diagnostic detection of hybridizing sequences. Probes are typically labeled with a radioactive isotope. Other types of detectable labels can be used such as chromophores, fluors, and enzymes. Other examples of nucleotide hybridization assays are described in WO92/02526 and U.S. Patent No. 5,124,246.
  • PCR Polymerase Chain Reaction
  • Two primer polynucleotides nucleotides hybridize with the target nucleic acids and are used to prime the reaction.
  • the primers can be composed of sequence within or 3' and 5' to the polynucleotides of the Sequence Listing. Alternatively, if the primers are 3' and 5' to these polynucleotides, they need not hybridize to them or the complements.
  • thermostable polymerase creates copies of target nucleic acids from the primers using the original target nucleic acids as a template. After a large amount of target nucleic acids is generated by the polymerase, it is detected by methods such as Southern blots. When using the Southern blot method, the labeled probe will hybridize to a polynucleotide of the Sequence Listing or complement.
  • mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook et al, "Molecular Cloning: A Laboratory Manual” (New York, Cold Spring Harbor Laboratory, 1989).
  • mRNA or cDNA generated from mRNA using a polymerase enzyme can be purified and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid support, such as nitrocellulose. The solid support is exposed to a labeled probe and then washed to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. Typically, the probe is labeled with radioactivity. Mapping.
  • Polynucleotides of the present invention are used to identify a chromosome on which the corresponding gene resides. Such mapping can be useful in identifying the function of the polynucleotide-related gene by its proximity to other genes with known function. Function can also be assigned to the polynucleotide-related gene when particular syndromes or diseases map to the same chromosome. For example, use of polynucleotide probes in identification and quantification of nucleic acid sequence aberrations is described in U.S. Patent No. 5,783,387.
  • FISH fluorescence in situ hybridization
  • Nucleotide probes comprising at least 12 contiguous nucleotides selected from the nucleotide sequence shown in the Sequence Listing are used to identify the corresponding chromosome.
  • the nucleotide probes are labeled, for example, with a radioactive, fluorescent, biotinylated, or chemiluminescent label, and detected by well known methods appropriate for the particular label selected. Protocols for hybridizing nucleotide probes to preparations of metaphase chromosomes are also well known in the art.
  • a nucleotide probe will hybridize specifically to nucleotide sequences in the chromosome preparations that are complementary to the nucleotide sequence of the probe.
  • Polynucleotides are mapped to particular chromosomes using, for example, radiation hybrids or chromosome-specific hybrid panels. See Leach et al, Advances in Genetics, (1995) 55:63-99; Walter et al, Nature Genetics (1994) 7:22; Walter and Goodfellow, Trends in Genetics (1992) 9:352. Panels for radiation hybrid mapping are available from Research Genetics, Inc., Huntsville, Alabama, USA. Databases for markers using various panels are available via the world wide web at http:/F/shgc-www.stanford.edu; and http://www- genome.wi.mit.edu/cgi-bin/contig/rhmapper.pl.
  • the statistical program RHMAP can be used to construct a map based on the data from radiation hybridization with a measure of the relative likelihood of one order versus another.
  • RHMAP is available via the world wide web at http://www.sph.umich.edu group/statgen software.
  • polynucleotides based on the polynucleotides of the invention can be used to probe these regions. For example, if through profile searching a provided polynucleotide is identified as corresponding to a gene encoding a kinase, its ability to bind to a cancer-related chromosomal region will suggest its role as a kinase in one or more stages of tumor cell development/growth. Although some experimentation would be required to elucidate the role, the polynucleotide constitutes a new material for isolating a specific protein that has potential for developing a cancer diagnostic or therapeutic. Tissue Typing or Profiling.
  • mRNA corresponding to the provided polynucleotides can vary in different cell types and can be tissue-specific. This variation of mRNA levels in different cell types can be exploited with nucleic acid probe assays to determine tissue types. For example, PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid probes substantially identical or complementary to polynucleotides listed in the Sequence Listing can determine the presence or absence of the corresponding cDNA or mRNA. For example, a metastatic lesion is identified by its developmental organ or tissue source by identifying the expression of a particular marker of that organ or tissue.
  • a polynucleotide is expressed only in a specific tissue type, and a metastatic lesion is found to express that polynucleotide, then the developmental source of the lesion has been identified. Expression of a particular polynucleotide is assayed by detection of either the corresponding mRNA or the protein product. Immunological methods, such as antibody staining, are used to detect a particular protein product. Hybridization methods can be used to detect particular mRNA species, including but not limited to in situ hybridization and Northern blotting.
  • a polynucleotide of the invention will be useful in forensics, genetic analysis, mapping, and diagnostic applications if the corresponding region of a gene is polymorphic in the human population.
  • Particular polymorphic forms of the provided polynucleotides can be used to either identify a sample as deriving from a suspect or rule out the possibility that the sample derives from the suspect. Any means for detecting a polymorphism in a gene are used, including but not limited to electrophoresis of protein polymorphic variants, differential sensitivity to restriction enzyme cleavage, and hybridization to allele-specific probes.
  • Expression products of a polynucleotide of the invention are prepared and used for raising antibodies for experimental, diagnostic, and therapeutic purposes. For polynucleotides to which a corresponding gene has not been assigned, this provides an additional method of identifying the corresponding gene.
  • the polynucleotide or related cDNA is expressed as described above, and antibodies are prepared. These antibodies are specific to an epitope on the polypeptide encoded by the polynucleotide, and can precipitate or bind to the corresponding native protein in a cell or tissue preparation or in a cell-free extract of an in vitro expression system.
  • Immunogens for raising antibodies are prepared by mixing the polypeptides encoded by the polynucleotides of the present invention with adjuvants. Alternatively, polypeptides are made as fusion proteins to larger immunogenic proteins. Polypeptides are also covalently linked to other larger immunogenic proteins, such as keyhole limpet hemocyanin. Immunogens are typically administered intradermally, subcutaneously, or intramuscularly. Immunogens are administered to experimental animals such as rabbits, sheep, and mice, to generate antibodies. Optionally, the animal spleen cells are isolated and fused with myeloma cells to form hybridomas which secrete monoclonal antibodies. Such methods are well known in the art.
  • the selected polynucleotide is administered directly, such as by intramuscular injection, and expressed in vivo.
  • the expressed protein generates a variety of protein-specific immune responses, including production of antibodies, comparable to administration of the protein.
  • polyclonal and monoclonal antibodies specific for polypeptides encoded by a selected polynucleotide are made using standard methods known in the art.
  • the antibodies specifically bind to epitopes present in the polypeptides encoded by polynucleotides disclosed in the Sequence Listing.
  • epitopes which involve non-contiguous amino acids may require more, for example at least 15, 25, or 50 amino acids.
  • a short sequence of a polynucleotide may then be unsuitable for use as an epitope to raise antibodies for identifying the corresponding novel protein, because of the potential for cross-reactivity with a known protein.
  • the antibodies can be useful for other purposes, particularly if they identify common structural features of a known protein and a novel polypeptide encoded by a polynucleotide of the invention.
  • Antibodies that specifically bind to human polypeptides encoded by the provided polypeptides should provide a detection signal at least 5-, 10-, or 20-fold higher than a detection signal provided with other proteins when used in Western blots or other immunochemical assays.
  • antibodies that specifically polypeptides of the invention do not bind to other proteins in immunochemical assays at detectable levels and can immunoprecipitate the specific polypeptide from solution.
  • human antibodies are purified by methods well known in the art.
  • the antibodies are affinity purified by passing antiserum over a column to which the corresponding selected polypeptide or fusion protein is bound. The bound antibodies can then be eluted from the column, for example using a buffer with a high salt concentration.
  • genetically engineered antibody derivatives are made, such as single chain antibodies, according to methods well known in the art.
  • Polynucleotide arrays provide a high throughput technique that can assay a large number of polynucleotide sequences in a sample. This technology can be used as a diagnostic and as a tool to test for differential expression to determine function of an encoded protein.
  • Arrays can be created by spotting polynucleotide probes onto a substrate (e.g., glass, nitrocelllose, etc.) in a two-dimensional matrix or array having bound probes. The probes can be bound to the substrate by either covalent bonds or by non-specific interactions, such as hydrophobic interactions.
  • Samples of polynucleotides can be detectably labeled (e.g., using radioactive or fluorescent labels) and then hybridized to the probes. Double stranded polynucleotides, comprising the labeled. sample polynucleotides bound to probe polynucleotides, can be detected once the unbound portion of the sample is washed away. Techniques for constructing arrays and methods of using these arrays are described in EP No. 0 799 897; PCT No. WO 97/29212; PCT No. WO 97/27317; EP No. 0 785 280; PCT No. WO 97/02357; U.S. Pat. No. 5,593,839; U.S. Pat. No.
  • arrays can be used to examine differential expression of genes and can be used to determine gene function.
  • arrays of the instant polynucleotide sequences can be used to determine if any of the provided polynucleotides are differentially expressed between a test cell and control cell (e.g., cancer cells and normal cells).
  • high expression of a particular message in a cancer cell can indicate a cancer specific protein.
  • Exemplary uses of arrays are further described in, for example, Pappalarado et al, Sem. Radiation Oncol. (1998) 5:217; and Ramsay Nature Biotechnol. (1998) 76:40.
  • the polynucleotides of the invention can also be used to detect differences in expression levels between two cells, e.g. , as a method to identify abnormal or diseased tissue in a human.
  • tissue can be selected according to the putative biological function.
  • the expression of a gene corresponding to a specific polynucleotide is compared between a first tissue that is suspected of being diseased and a second, normal tissue of the human.
  • the tissue suspected of being abnormal or diseased can be derived from a different tissue type of the human, but preferably it is derived from the same tissue type; for example an intestinal polyp or other abnormal growth should be compared with normal intestinal tissue.
  • the normal tissue can be the same tissue as that of the test sample, or any normal tissue of the patient, especially those that express the polynucleotide-related gene of interest (e.g., brain, thymus, testis, heart, prostate, placenta, spleen, small intestine, skeletal muscle, pancreas, and the mucosal lining of the colon).
  • a difference between the polynucleotide- related gene, mRNA, or protein in the two tissues which are compared, for example in molecular weight, amino acid or nucleotide sequence, or relative abundance, indicates a change in the gene, or a gene which regulates it, in the tissue of the human that was suspected of being diseased. Examples of detection of differential expression and its use in diagnosis of cancer are described in U.S. Patent Nos. 5,688,641 and 5,677,125.
  • the polynucleotide-related genes in the two tissues are compared by any means known in the art.
  • the two genes can be sequenced, and the sequence of the gene in the tissue suspected of being diseased compared with the gene sequence in the normal tissue.
  • the genes corresponding to a provided polynucleotide, or portions thereof, in the two tissues are amplified, for example using nucleotide primers based on the nucleotide sequence shown in the Sequence Listing, using the polymerase chain reaction.
  • the amplified genes or portions of genes are hybridized to detectably labeled nucleotide probes selected from a nucleotide sequence shown in the Sequence Listing.
  • a difference in the nucleotide sequence of the isolated gene in the tissue suspected of being diseased compared with the normal nucleotide sequence suggests a role of the gene product encoded by the subject polynucleotide in the disease, and provides guidance for preparing a therapeutic agent.
  • mRNA corresponding to a provided polynucleotide in the two tissues is compared.
  • PolyA + RNA is isolated from the two tissues as is known in the art.
  • one of skill in the art can readily determine differences in the size or amount of mRNA transcripts between the two tissues using Northern blots and detectably labeled nucleotide probes selected from the nucleotide sequence shown in the Sequence Listing.
  • the comparison can also be accomplished by analyzing polypeptides between the matched samples.
  • the sizes of the proteins in the two tissues are compared, for example, using antibodies of the present invention to detect polypeptides in Western blots of protein extracts from the two tissues.
  • Other changes, such as expression levels and subcellular localization, can also be detected immunologically, using antibodies to the corresponding protein.
  • a higher or lower level of expression of a given polypeptide in a tissue suspected of being diseased, compared with the same protein expression level in a normal tissue is indicative that the expressed protein has a role in the disease, and provides guidance for preparing a therapeutic agent.
  • comparison of polynucleotide sequences or of gene expression products, e.g., mRNA and protein, between a human tissue that is suspected of being diseased and a normal tissue of a human are used to follow disease progression or remission in the human.
  • Such comparisons are made as described above.
  • increased or decreased expression of a gene corresponding to an inventive polynucleotide in the tissue suspected of being neoplastic can indicate the presence of neoplastic cells in the tissue.
  • the degree of increased expression of a given gene in the neoplastic tissue relative to expression of the same gene in normal tissue, or differences in the amount of increased expression of a given gene in the neoplastic tissue over time, is used to assess the progression of the neoplasia in that tissue or to monitor the response of the neoplastic tissue to a therapeutic protocol over time.
  • the expression pattern of any two cell types can be compared, such as low and high metastatic tumor cell lines, malignant or non-malignant cells, or cells from tissue which have and have not been exposed to a therapeutic agent.
  • a genetic predisposition to disease in a human is detected by comparing expression levels of an mRNA or protein corresponding to a polynucleotide of the invention in a fetal tissue with levels associated in normal fetal tissue.
  • Fetal tissues that are used for this purpose include, but are not limited to, amniotic fluid, chorionic villi, blood, and the blastomere of an in vitro-fertilized embryo.
  • the comparable normal polynucleotide-related gene is obtained from any tissue.
  • the mRNA or protein is obtained from a normal tissue of a human in which the polynucleotide-related gene is expressed. Differences such as alterations in the nucleotide sequence or size of the same product of the fetal polynucleotide-related gene or mRNA, or alterations in the molecular weight, amino acid sequence, or relative abundance of fetal protein, can indicate a germline mutation in the polynucleotide-related gene of the fetus, which indicates a genetic predisposition to disease. Particular diagnostic and prognostic uses of the disclosed polynucleotides are described in more detail below. E. Diagnostic. Prognostic, and Other Uses Based On Differential Expression
  • diagnostic methods of the invention for involve detection of a level or amount of a gene product, particularly a differentially expressed gene product, in a test sample obtained from a patient suspected of having or being susceptible to a disease (e.g., breast cancer, lung cancer, colon cancer and/or metastatic forms thereof), and comparing the detected levels to those levels found in normal cells (e.g., cells substantially unaffected by cancer) and/or other control cells (e.g., to differentiate a cancerous cell from a cell affected by dysplasia).
  • a disease e.g., breast cancer, lung cancer, colon cancer and/or metastatic forms thereof
  • normal cells e.g., cells substantially unaffected by cancer
  • other control cells e.g., to differentiate a cancerous cell from a cell affected by dysplasia
  • the severity of the disease can be assessed by comparing the detected levels of a differentially expressed gene product with those levels detected in samples representing the levels of differentially gene product associated with varying degrees of severity of disease.
  • the term "differentially expressed gene” is intended to encompass a polynucleotide that can, for example, include an open reading frame encoding a gene product (e.g., a polypeptide), and/or introns of such genes and adjacent 5' and 3' non-coding nucleotide sequences involved in the regulation of expression, up to about 20 kb beyond the coding region, but possibly further in either direction.
  • the gene can be introduced into an appropriate vector for extrachromosomal maintenance or for integration into a host genome.
  • a difference in expression level associated with a decrease in expression level of at least about 25%, usually at least about 50% to 75%, more usually at least about 90% or more is indicative of a differentially expressed gene of interest, /. e. , a gene that is underexpressed or down-regulated in the test sample relative to a control sample.
  • a difference in expression level associated with an increase in expression of at least about 25%, usually at least about 50% to 75%, more usually at least about 90% and can be at least about 1 '/--fold, usually at least about 2-fold to about 10-fold, and can be about 100-fold to about 1, 000-fold increase relative to a control sample is indicative of a differentially expressed gene of interest, i.e., an overexpressed or up-regulated gene.
  • “Differentially expressed polynucleotide” as used herein means a nucleic acid molecule (RNA or DNA) having a sequence that represents a differentially expressed gene, e.g., the differentially expressed polynucleotide comprises a sequence (e.g., an open reading frame encoding a gene product) that uniquely identifies a differentially expressed gene so that detection of the differentially expressed polynucleotide in a sample is correlated with the presence of a differentially expressed gene in a sample.
  • RNA or DNA nucleic acid molecule
  • the differentially expressed polynucleotide comprises a sequence (e.g., an open reading frame encoding a gene product) that uniquely identifies a differentially expressed gene so that detection of the differentially expressed polynucleotide in a sample is correlated with the presence of a differentially expressed gene in a sample.
  • “Differentially expressed polynucleotides” is also meant to encompass fragments of the disclosed polynucleotides, e.g., fragments retaining biological activity, as well as nucleic acids homologous, substantially similar, or substantially identical (e.g., having about 90% sequence identity) to the disclosed polynucleotides.
  • Methods of the subject invention useful in diagnosis or prognosis typically involve comparison of the abundance of a selected differentially expressed gene product in a sample of interest with that of a control to determine any relative differences in the expression of the gene product, where the difference can be measured qualitatively and/or quantitatively. Quantitation can be accomplished, for example, by comparing the level of expression product detected in the sample with the amounts of product present in a standard curve.
  • a comparison can be made visually; by using a technique such as densitometry, with or without computerized assistance; by preparing a representative library of cDNA clones of mRNA isolated from a test sample, sequencing the clones in the library to determine that number of cDNA clones corresponding to the same gene product, and analyzing the number of clones corresponding to that same gene product relative to the number of clones of the same gene product in a control sample; or by using an array to detect relative levels of hybridization to a selected sequence or set of sequences, and comparing the hybridization pattern to that of a control. The differences in expression are then correlated with the presence or absence of an abnormal expression pattern.
  • diagnostic assays of the invention involve detection of a gene product of a the polynucleotide sequence (e.g., mRNA or polypeptide) that corresponds to a sequence of SEQ ID NOS: 1-844.
  • the patient from whom the sample is obtained can be apparently healthy, susceptible to disease (e.g., as determined by family history or exposure to certain environmental factors), or can already be identified as having a condition in which altered expression of a gene product of the invention is implicated.
  • the diagnosis can be determined based on detected gene product expression levels of a gene product encoded by at least one, preferably at least two or more, at least 3 or more, or at least 4 or more of the polynucleotides having a sequence set forth in SEQ ID NOS: 1-844, and can involve detection of expression of genes corresponding to all of SEQ ID NOS: 1-844 and/or additional sequences that can serve as additional diagnostic markers and/or reference sequences.
  • the diagnostic method is designed to detect the presence or susceptibility of a patient to cancer
  • the assay preferably involves detection of a gene product encoded by a gene corresponding to a polynucleotide that is differentially expressed in cancer.
  • a higher level of expression of a polynucleotide corresponding to SEQ ID NO:52 relative to a level associated with a normal sample can indicate the presence of cancer in the patient from whom the sample is derived.
  • detection of a lower level of a polynucleotide corresponding to SEQ ID NO:39 relative to a normal level is indicative of the presence of cancer in the patient.
  • differentially expressed polynucleotides are described in the Examples below. Given the provided polynucleotides and information regarding their relative expression levels provided herein, assays using such polynucleotides and detection of their expression levels in diagnosis and prognosis will be readily apparent to the ordinarily skilled artisan.
  • detectable labels include fluorochromes,(e.g. fluorescein isothiocyanate (FITC), rhodamine, Texas Red, phycoerythrin, allophycocyanin, 6-carboxyfluorescein (6-FAM), 2',7'-dimethoxy-4',5'- dichloro-6-carboxyfluorescein (JOE), 6-carboxy-X-rhodamine (ROX), 6-carboxy-
  • fluorochromes e.g. fluorescein isothiocyanate (FITC), rhodamine, Texas Red, phycoerythrin, allophycocyanin, 6-carboxyfluorescein (6-FAM), 2',7'-dimethoxy-4',5'- dichloro-6-carboxyfluorescein (JOE), 6-carboxy-X-rhodamine (ROX), 6-carboxy-
  • the detectable label can involve a two stage systems (e.g., biotin-avidin, hapten- anti-hapten antibody, etc.)
  • Reagents specific for the polynucleotides and polypeptides of the invention such as antibodies and nucleotide probes, can be supplied in a kit for detecting the presence of an expression product in a biological sample.
  • the kit can also contain buffers or labeling components, as well as instructions for using the reagents to detect and quantify expression products in the biological sample. Exemplary embodiments of the diagnostic methods of the invention are described below in more detail.
  • the test sample is assayed for the level of a differentially expressed polypeptide.
  • Diagnosis can be accomplished using any of a number of methods to determine the absence or presence or altered amounts of the differentially expressed polypeptide in the test sample.
  • detection can utilize staining of cells or histological sections with labeled antibodies, performed in accordance with conventional methods. Cells can be permeabilized to stain cytoplasmic molecules.
  • antibodies that specifically bind a differentially expressed polypeptide of the invention are added to a sample, and incubated for a period of time sufficient to allow binding to the epitope, usually at least about 10 minutes.
  • the antibody can be detectably labeled for direct detection (e.g., using radioisotopes, enzymes, fluorescers, chemiluminescers, and the like), or can be used in conjunction with a second stage antibody or reagent to detect binding (e.g., biotin with horseradish peroxidase-conjugated avidin, a secondary antibody conjugated to a fluorescent compound, e.g. fluorescein, rhodamine, Texas red, etc.).
  • the absence or presence of antibody binding can be determined by various methods, including flow cytometry of dissociated cells, microscopy, radiography, scintillation counting, etc. Any suitable alternative methods can of qualitative or quantitative detection of levels or amounts of differentially expressed polypeptide can be used, for example ELISA, western blot, immunoprecipitation, radioimmunoassay, etc.
  • the detected level of differentially expressed polypeptide in the test sample is compared to a level of the differentially expressed gene product in a reference or control sample, e.g., in a normal cell (negative control) or in a cell having a known disease state (positive control).
  • a higher level of expression of a polypeptide encoded by SEQ ID NO:52 relative to a level associated with a normal sample can indicate the presence of cancer in the patient from whom the sample is derived.
  • detection of a lower level of the polypeptide encoded by SEQ ID NO:39 relative to a normal level is indicative of the presence of cancer in the patient.
  • mRNA detection is indicative of the presence of cancer in the patient.
  • the diagnostic methods of the invention can also or alternatively involve detection of mRNA encoded by a gene corresponding to a differentially expressed polynucleotides of the invention. Any suitable qualitative or quantitative methods known in the art for detecting specific mRNAs can be used. mRNA can be detected by, for example, in situ hybridization in tissue sections, by reverse transcriptase-PCR, or in Northern blots containing poly A+ mRNA. One of skill in the art can readily use these methods to determine differences in the size or amount of mRNA transcripts between two samples.
  • the level of mRNA of the invention in a tissue sample suspected of being cancerous or dysplastic is compared with the expression of the mRNA in a reference sample, e.g., a positive or negative control sample (e.g., normal tissue, cancerous tissue, etc.).
  • a positive or negative control sample e.g., normal tissue, cancerous tissue, etc.
  • a higher level of mRNA corresponding to SEQ ID NO:52 relative to a level associated with a normal sample can indicate the presence of cancer in the patient from whom the sample is derived.
  • detection of a lower level of mRNA corresponding to SEQ ID NO:39 relative to a normal level is indicative of the presence of cancer in the patient.
  • mRNA expression levels in a sample can be determined by generation of a library of expressed sequence tags (ESTs) from the sample, where the EST library is representative of sequences present in the sample (Adams, et al., (1991) Science
  • Enumeration of the relative representation of ESTs within the library can be used to approximate the relative representation of the gene transcript within the starting sample.
  • results of EST analysis of a test sample can then be compared to EST analysis of a reference sample to determine the relative expression levels of a selected polynucleotide, particularly a polynucleotide corresponding to one or more of the differentially expressed genes described herein.
  • gene expression in a test sample can be performed using serial analysis of gene expression (SAGE) methodology (Velculescu et al., Science (1995) 270:484).
  • SAGE serial analysis of gene expression
  • SAGE involves the isolation of short unique sequence tags from a specific location within each transcript (e.g., a sequence of any one of SEQ ID NOS:l-6).
  • the sequence tags are concatenated, cloned, and sequenced.
  • the frequency of particular transcripts within the starting sample is reflected by the number of times the associated sequence tag is encountered with the sequence population.
  • Gene expression in a test sample can also be analyzed using differential display (DD) methodology.
  • DD differential display
  • fragments defined by specific sequence delimiters e.g., restriction enzyme sites
  • the relative representation of an expressed gene with a sample can then be estimated based on the relative representation of the fragment associated with that gene within the pool of all possible fragments.
  • Methods and compositions for carrying out DD are well known in the art, see, e.g., U.S. 5,776,683; and U.S. 5,807,680.
  • hybridization analysis which is based on the specificity of nucleotide interactions.
  • Oligonucleotides or cDNA can be used to selectively identify or capture DNA or RNA of specific sequence composition, and the amount of RNA or cDNA hybridized to a known capture sequence determined qualitatively or quantitatively, to provide information about the relative representation of a particular message within the pool of cellular messages in a sample.
  • Hybridization analysis can be designed to allow for concurrent screening of the relative expression of hundreds to thousands of genes by using, for example, array-based technologies having high density formats, including filters, microscope slides, or microchips, or solution-based technologies that use spectroscopic analysis (e.g., mass spectrometry).
  • the diagnostic methods of the invention can focus on the expression of a single differentially expressed gene.
  • the diagnostic method can involve detecting a differentially expressed gene, or a polymorphism of such a gene (e.g., a polymorphism in an coding region or control region), that is associated with disease.
  • Disease-associated polymorphisms can include deletion or truncation of the gene, mutations that alter expression level and/or affect activity of the encoded protein, etc.
  • Changes in the promoter or enhancer sequence that affect expression levels of an differentially gene can be compared to expression levels of the normal allele by various methods known in the art.
  • Methods for determining promoter or enhancer strength include quantitation of the expressed natural protein; insertion of the variant control element into a vector with a reporter gene such as ⁇ -galactosidase, luciferase, chloramphenicol acetyltransferase, etc. that provides for convenient quantitation; and the like.
  • a number of methods are available for analyzing nucleic acids for the presence of a specific sequence, e.g. a disease associated polymorphism. Where large amounts of DNA are available, genomic DNA is used directly. Alternatively, the region of interest is cloned into a suitable vector and grown in sufficient quantity for analysis. Cells that express a differentially expressed gene can be used as a source of mRNA, which can be assayed directly or reverse transcribed into cDNA for analysis.
  • the nucleic acid can be amplified by conventional techniques, such as the polymerase chain reaction (PCR), to provide sufficient amounts for analysis, and a detectable label can be included in the amplification reaction (e.g., using a detectably labeled primer or detectably labeled oligonucleotides) to facilitate detection.
  • PCR polymerase chain reaction
  • a detectable label can be included in the amplification reaction (e.g., using a detectably labeled primer or detectably labeled oligonucleotides) to facilitate detection.
  • the use of the polymerase chain reaction is described in Saiki, et ⁇ l, Science (1985) 259:487, and a review of techniques can be found in Sambrook, et ⁇ l., Molecular Cloning: A Laboratory Manual, (1989) pp. 14.2.
  • the sample nucleic acid e.g. amplified or cloned fragment, is analyzed by one of a number of methods known in the art.
  • the nucleic acid can be sequenced by dideoxy or other methods, and the sequence of bases compared to a selected sequence, e.g., to a wild-type sequence.
  • Hybridization with the polymorphic or variant sequence can also be used to determine its presence in a sample (e.g. , by Southern blot, dot blot, etc.).
  • the hybridization pattern of a polymorphic or variant sequence and a control sequence to an array of oligonucleotide probes immobilized on a solid support can also be used as a means of identifying polymorphic or variant sequences associated with disease.
  • Single strand conformational polymorphism (SSCP) analysis, denaturing gradient gel electrophoresis (DGGE), and heteroduplex analysis in gel matrices are used to detect conformational changes created by DNA sequence variation as alterations in electrophoretic mobility.
  • the sample is digested with that endonuclease, and the products size fractionated to determine whether the fragment was digested. Fractionation is performed by gel or capillary electrophoresis, particularly acrylamide or agarose gels.
  • Screening for mutations in an differentially expressed gene can be based on the functional or antigenic characteristics of the protein.
  • Protein truncation assays are useful in detecting deletions that can affect the biological activity of the protein.
  • Various immunoassays designed to detect polymorphisms in proteins can be used in screening. Where many diverse genetic mutations lead to a particular disease phenotype, functional protein assays have proven to be effective screening tools.
  • the activity of the encoded protein can be determined by comparison with the wild-type protein.
  • the diagnostic and/or prognostic methods of the invention involve detection of expression of a selected set of genes in a test sample to produce a test expression pattern (TEP).
  • TEP test expression pattern
  • REP reference expression pattern
  • the selected set of genes includes at least one of the genes of the invention, which genes correspond to the polynucleotide sequences of SEQ ID NOS: 1-844.
  • Of particular interest is a selected set of genes that includes gene differentially expressed in the disease for which the test sample is to be screened.
  • Reference sequences or “reference polynucleotides” as used herein in the context of differential gene expression analysis and diagnosis/prognosis refers to a selected set of polynucleotides, which selected set includes at least one or more of the differentially expressed polynucleotides described herein.
  • a plurality of reference sequences preferably comprising positive and negative control sequences, can be included as reference sequences. Additional suitable reference sequences are found in Genbank, Unigene, and other nucleotide sequence databases (including, e.g., expressed sequence tag (EST), partial, and full-length sequences).
  • EST expressed sequence tag
  • Reference array means an array having reference sequences for use in hybridization with a sample, where the reference sequences include all, at least one of, or any subset of the differentially expressed polynucleotides described herein. Usually such an array will include at least 3 different reference sequences, and can include any one or all of the provided differentially expressed sequences.
  • Arrays of interest can further comprise sequences, including polymorphisms, of other genetic sequences, particularly other sequences of interest for screening for a disease or disorder (e.g., cancer, dysplasia, or other related or unrelated diseases, disorders, or conditions).
  • the oligonucleotide sequence on the array will usually be at least about 12 nt in length, and can be of about the length of the provided sequences, or can extend into the flanking regions to generate fragments of 100 nt to 200 nt in length or more.
  • a “reference expression pattern” or “REP” as used herein refers to the relative levels of expression of a selected set of genes, particularly of differentially expressed genes, that is associated with a selected cell type, e.g., a normal cell, a cancerous cell, a cell exposed to an environmental stimulus, and the like.
  • a “test expression pattern” or “TEP” refers to relative levels of expression of a selected set of genes, particularly of differentially expressed genes, in a test sample (e.g., a cell of unknown or suspected disease state, from which mRNA is isolated).
  • Diagnosis generally includes determination of a subject's susceptibility to a disease or disorder, determination as to whether a subject is presently affected by a disease or disorder, as well as to the prognosis of a subject affected by a disease or disorder (e.g., identification of pre-metastatic or metastatic cancerous states, stages of cancer, or responsiveness of cancer to therapy).
  • the present invention particularly encompasses diagnosis of subjects in the context of breast cancer (e.g. , carcinoma in situ (e.g.
  • sample or “biological sample” as used throughout here are generally meant to refer to samples of biological fluids or tissues, particularly samples obtained from tissues, especially from cells of the type associated with the disease for which the diagnostic application is designed (e.g., ductal adenocarcinoma), and the like.
  • samples is also meant to encompass derivatives and fractions of such samples (e.g., cell lysates).
  • the cells of the tissue can be dissociated or tissue sections can be analyzed.
  • REPs can be generated in a variety of ways according to methods well known in the art. For example, REPs can be generated by hybridizing a control sample to an array having a selected set of polynucleotides (particularly a selected set of differentially expressed polynucleotides), acquiring the hybridization data from the array, and storing the data in a format that allows for ready comparison of the REP with a TEP.
  • all expressed sequences in a control sample can be isolated and sequenced, e.g., by isolating mRNA from a control sample, converting the mRNA into cDNA, and sequencing the cDNA.
  • the resulting sequence information roughly or precisely reflects the identity and relative number of expressed sequences in the sample.
  • the sequence information can then be stored in a format (e.g., a computer-readable format) that allows for ready comparison of the REP with a TEP.
  • the REP can be normalized prior to or after data storage, and/or can be processed to selectively remove sequences of expressed genes that are of less interest or that might complicate analysis (e.g., some or all of the sequences associated with housekeeping genes can be eliminated from REP data).
  • TEPs can be generated in a manner similar to REPs, e.g. , by hybridizing a test sample to an array having a selected set of polynucleotides, particularly a selected set of differentially expressed polynucleotides, acquiring the hybridization data from the array, and storing the data in a format that allows for ready comparison of the TEP with a REP.
  • the REP and TEP to be used in a comparison can be generated simultaneously, or the TEP can be compared to previously generated and stored REPs.
  • comparison of a TEP with a REP involves hybridizing a test sample with a reference array, where the reference array has one or more reference sequences for use in hybridization with a sample.
  • the reference sequences include all, at least one of, or any subset of the differentially expressed polynucleotides described herein.
  • Hybridization data for the test sample is acquired, the data normalized, and the produced TEP compared with a REP generated using an array having the same or similar selected set of differentially expressed polynucleotides.
  • Probes that correspond to sequences differentially expressed between the two samples will show decreased or increased hybridization efficiency for one of the samples relative to the other.
  • Reference arrays can be produced according to any suitable methods known in the art. For example, methods of producing large arrays of oligonucleotides are described in U.S. 5,134,854, and U.S. 5,445,934 using light-directed synthesis techniques.
  • a heterogeneous array of monomers is converted, through simultaneous coupling at a number of reaction sites, into a heterogeneous array of polymers.
  • microarrays are generated by deposition of pre-synthesized oligonucleotides onto a solid substrate, for example as described in PCT published application no. WO 95/35505.
  • the polynucleotides of the reference and test samples can be generated using a detectable fluorescent label, and hybridization of the polynucleotides in the samples detected by scanning the microarrays for the presence of the detectable label.
  • Methods and devices for detecting fluorescently marked targets on devices are known in the art.
  • detection devices include a microscope and light source for directing light at a substrate.
  • a photon counter detects fluorescence from the substrate, while an x-y translation stage varies the location of the substrate.
  • a confocal detection device that can be used in the subject methods is described in U.S. Patent no. 5,631,734.
  • a scanning laser microscope is described in Shalon et al., Genome Res. (1996) 6:639.
  • a scan using the appropriate excitation line, is performed for each fluorophore used.
  • the digital images generated from the scan are then combined for subsequent analysis.
  • the ratio of the fluorescent signal from one sample e.g., a test sample
  • another sample e.g., a reference sample
  • data analysis can include the steps of determining fluorescent intensity as a function of substrate position from the data collected, removing outliers, i.e. data deviating from a predetermined statistical distribution, and calculating the relative binding affinity of the targets from the remaining data.
  • the resulting data can be displayed as an image with the intensity in each region varying according to the binding affinity between targets and probes.
  • the test sample is classified as having a gene expression profile corresponding to that associated with a disease or non-disease state by comparing the TEP generated from the test sample to one or more REPs generated from reference samples (e.g., from samples associated with cancer or specific stages of cancer, dysplasia, samples affected by a disease other than cancer, normal samples, etc.).
  • the criteria for a match or a substantial match between a TEP and a REP include expression of the same or substantially the same set of reference genes, as well as expression of these reference genes at substantially the same levels (e.g., no significant difference between the samples for a signal associated with a selected reference sequence after normalization of the samples, or at least no greater than about 25% to about 40% difference in signal strength for a given reference sequence.
  • a pattern match between a TEP and a REP includes a match in expression, preferably a match in qualitative or quantitative expression level, of at least one of, all or any subset of the differentially expressed genes of the invention.
  • Pattern matching can be performed manually, or can be performed using a computer program.
  • Methods for preparation of substrate matrices e.g., arrays
  • design of oligonucleotides for use with such matrices labeling of probes, hybridization conditions, scanning of hybridized matrices, and analysis of patterns generated, including comparison analysis, are described in, for example, U.S. 5,800,992.
  • Cancerous cells can have the ability to compress, invade, and destroy normal tissue. Cancerous cells may also metastasize to other parts of the body via the bloodstream or the lymph system and colonize in these other areas. Different cancers are classified by the cell from which the cancerous cell is derived and from its cellular morphology and/or state of differentiation.
  • Cancer generally is clonally formed, i.e. gain of function of oncogenes and loss of function of tumor suppressor genes within a single cell transform the cell to be cancerous, and that single cell grows and divides to form a cancerous lesion.
  • the genes known to be involved in cancer initiation and progression are involved in numerous cellular functions, including developmental differentiation, cell cycle regulation, cell signaling, immunological response, DNA replication, and DNA repair.
  • Determining expression of certain polynucleotides and comparison of a patients profile with known expression in normal tissue and variants of the disease allows a determination of the best possible treatment for a patient, both in terms of specificity of treatment and in terms of comfort level of the patient.
  • Surrogate tumor markers such as polynucleotide expression, can also be used to better classify, and thus diagnose and treat, different forms and disease states of cancer.
  • Two classifications widely used in oncology that can benefit from identification of the expression levels of the polynucleotides of the invention are staging of the cancerous disorder, and grading the nature of the cancerous tissue.
  • Staging is a process used by physicians to describe how advanced the cancerous state is in a patient. Staging assists the physician in determining a prognosis, planning treatment and evaluating the results of such treatment.
  • Different staging systems are used for different types of cancer, but each generally involves the following determinations: the type of tumor, indicated by T; whether the cancer has metastasized to nearby lymph nodes, indicated by N; and whether the cancer has metastasized to more distant parts of the body, indicated by M.
  • This system of staging is called the TNM system.
  • Stage I if a cancer is only detectable in the area of the primary lesion without having spread to any lymph nodes it is called Stage I. If it has spread only to the closest lymph nodes, it is called Stage II.
  • Stage III the cancer has generally spread to the lymph nodes in near proximity to the site of the primary lesion. Cancers that have spread to a distant part of the body, such as the liver, bone, brain or another site, are called Stage IV, the most advanced stage.
  • the determination of staging is done using pathological techniques and is based more on the presence or absence of malignant tissue rather than the characteristics of the tumor type. Presence or absence of malignant tissue is based primarily on the gross morphology of the cells in the areas biopsied.
  • the polynucleotides of the invention can facilitate fine-tuning of the staging process by identifying markers for the aggresivity of a cancer, e.g. the metastatic potential, as well as the presence in different areas of the body.
  • a Stage II cancer with a polynucleotide signifying a high metastatic potential cancer can be used to change a borderline Stage II tumor to a Stage III tumor, justifying more aggressive therapy.
  • the presence of a polynucleotide signifying a lower metastatic potential allows more conservative staging of a tumor.
  • Grade is a term used to describe how closely a tumor resembles normal tissue of its same type. Based on the microscopic appearance of a tumor, pathologists will identify the grade of a tumor based on parameters such as cell morphology, cellular organization, and other markers of differentiation. As a general rule, the grade of a tumor corresponds to its rate of growth or aggressiveness. ⁇ That is, undifferentiated or high- grade tumors grow more quickly than well differentiated or low-grade tumors. Information about tumor grade is useful in planning treatment and predicting prognosis.
  • GX Grade cannot be assessed; 2) GI Well differentiated; G2 Moderately well differentiated; 3) G3 Poorly differentiated; 4) G4 Undifferentiated.
  • Gleason system that is specific for prostate cancer, which uses grade numbers to describe the degree of differentiation. Lower Gleason scores indicate well-differentiated cells. Intermediate scores denote tumors with moderately differentiated cells. Higher scores describe poorly differentiated cells. Grade is also important in some types of brain tumors and soft tissue sarcomas.
  • the polynucleotides of the invention can be especially valuable in determining the grade of the tumor, as they not only can aid in determining the differentiation status of the cells of a tumor, they can also identify factors other than differentiation that are valuable in determining the aggressivity of a tumor, such as metastatic potential.
  • Familial Cancer Genes A number of cancer syndromes are linked to Mendelian inheritance of a predisposition to develop particular cancers.
  • the following table contains a list of cancer types that can be inherited, and for which the gene or genes responsible have been identified. Most of the cancer types listed can occur as part of several different genetic conditions, each caused by alterations in a different gene.
  • TSC2 Tuberous sclerosis 2
  • HNPCC Hereditary non-polyposis colon cancer
  • HNPCC Hereditary non-polyposis colon cancer 2 hMLHl Cancer Type Genetic Condition Gene
  • HNPCC Hereditary non-polyposis colon cancer
  • HNPCC Hereditary non-polyposis colon cancer
  • HNPCC Hereditary non-polyposis colon cancer
  • MEN1 Endocrine Multiple endocrine neoplasia 1
  • HNPCC Endometrial Hereditary non-polyposis colon cancer
  • HNPCC Hereditary non-polyposis colon cancer
  • HNPCC Hereditary non-polyposis colon cancer
  • HNPCC Hereditary non-polyposis colon cancer
  • HNPCC hMSH2 Hereditary non-polyposis colon cancer
  • HNPCC hMLHl Hereditary non-polyposis colon cancer
  • HNPCC hPMSl
  • HNPCC Hereditary non-polyposis colon cancer
  • HNPCC Hereditary non-polyposis colon cancer 1 hMSH2 Hereditary non-polyposis colon cancer (HNPCC) 2 hMLHl Hereditary non-polyposis colon cancer (HNPCC) 3 hPMSl Hereditary non-polyposis colon cancer (HNPCC) 4 hPMS2
  • the polynucleotides of the invention can be especially useful to monitor patients having any of the above syndromes to detect potentially malignant events at a molecular level before they are detectable at a gross morphological level.
  • a number of genes are involved in multiple forms of cancer.
  • a polynucleotide of the invention identified as important for metastatic colon cancer can also have clinical implications for a patient diagnosed with stomach cancer or endometrial cancer.
  • Lung cancer is one of the most common cancers in the United States, accounting for about 15 percent of all cancer cases, or 170,000 new cases each year. At this time, over half of the lung cancer cases in the United States are in men, but the number found in women is increasing and will soon equal that in men. Today more women die of lung cancer than of breast cancer. Lung cancer is especially difficult to diagnose and treat because of the large size of the lungs, which allows cancer to develop for years undetected. In fact, lung cancer can spread outside the lungs without causing any symptoms. Adding to the confusion, the most common symptom of lung cancer, a persistent cough, can often be mistaken for a cold or bronchitis.
  • small cell carcinoma also called oat cell carcinoma
  • NSCLC Nonsmall cell lung cancer
  • Epidermoid carcinoma also called squamous cell carcinoma
  • Adenocarcinoma starts growing near the outside surface of the lung and can vary in both size and growth rate.
  • adenocarcinomas are described as alveolar cell cancer. Large cell carcinoma starts near the surface of the lung, grows rapidly, and the growth is usually fairly large when diagnosed. Other less common forms of lung cancer are carcinoid, cylindroma, mucoepidermoid, and malignant mesothelioma.
  • CT scans, MRIs, X-rays, sputum cytology, and biopsies are used to diagnose nonsmall cell lung cancer.
  • the form and cellular origin of the lung cancer is diagnosed primarily through biopsy from either a surgical biopsy or a needle aspiration of lung tissue, and usually the biopsy is prompted from an abnormality identified on an X-ray.
  • sputum cytology can reveal lung cancers in patients with normal X-rays or can determine the type of lung cancer, but because it cannot pinpoint the tumor's location, a positive sputum cytology test is usually followed by further tests. Since these tests are based in large part on gross morphology of the tissue, the diagnosis of a particular kind of tumor is largely subjective, and the diagnosis can vary significantly between clinicians.
  • the polynucleotides of the invention can be used to distinguish types of lung cancer as well as identifying traits specific to a certain patient's cancer. For example, if the patient's biopsy expresses a polynucleotide that is associated with a low metastatic potential, it may justify leaving a larger portion of the patient's lung in surgery to remove the lesion. Alternatively, a smaller lesion with expression of a polynucleotide that is associated with high metastatic potential may justify a more radical removal of lung tissue and/or the surrounding lymph nodes, even if no metastasis can be identified through pathological examination.
  • polynucleotides of the invention can be used in the diagnosis, prognosis and management of colorectal cancer.
  • the differential expression of a polynucleotide in hyperplasia can be used as a diagnostic marker for metastatic lung cancer.
  • the polynucleotides of the invention that would be especially useful for this purpose are those that exhibit differential expression between high metastatic versus low metastatic lung cancer , i.e. SEQ ID NOS: 9, 34, 42, 62, 74, 106, 119, 135, 154, 160, 260, 308, 323, 349, 361, 369, 371, 381, 395, and 400. Detection of malignant lung cancer with a higher metastatic potential can be determined using expression levels of any of these sequences alone or in combination with the levels of expression of other known genes.
  • NCI National Cancer Institute
  • Ductal carcinoma in situ is the most common type of noninvasive breast cancer. In DCIS, the malignant cells have not metastasized through the walls of the ducts into the fatty tissue of the breast. Comedocarcinoma is a type of DCIS that is more likely than other types of DCIS to come back in the same area after lumpectomy. It is more closely linked to eventual development of invasive ductal carcinoma than other forms of DCIS.
  • Infiltrating (or invasive) ductal carcinoma (IDC): this type of cancer has metastasized through the wall of the duct and invaded the fatty tissue of the breast. At this point, it has the potential to use the lymphatic system and bloodstream for metastasis to more distant parts of the body. Infiltrating ductal carcinoma accounts for about 80% of breast cancers.
  • LCIS Lobular carcinoma in situ
  • ILC Infiltrating (or invasive) lobular carcinoma: ILC is similar to IDC, in that it has the potential metastasize elsewhere in the body. About 10% to 15% of invasive breast cancers are invasive lobular carcinomas. ILC can be more difficult to detect by mammogram than IDC.
  • Inflammatory breast cancer This rare type of invasive breast cancer accounts for about 1% of all breast cancers and is extremely aggressive. Multiple skin symptoms associated with this cancer are caused by cancer cells blocking lymph vessels or channels in the skin over the breast.
  • Medullary carcinoma This special type of infiltrating breast cancer has a relatively well defined, distinct boundary between tumor tissue and normal tissue. It accounts for about 5% of breast cancers. The prognosis for this kind of breast cancer is better than for other types of invasive breast cancer. Mucinous carcinoma: This rare type of invasive breast cancer originates from mucus- producing cells. The prognosis for mucinous carcinoma is better than for the more common types of invasive breast cancer.
  • Paget's disease of the nipple This type of breast cancer starts in the ducts and spreads to the skin of the nipple and the areola. It is a rare type of breast cancer, occurring in only 1% of all cases. Paget's disease can be associated with in situ carcinoma, or with infiltrating breast carcinoma. If no lump can be felt in the breast tissue, and the biopsy shows DCIS but no invasive cancer, the prognosis is excellent.
  • Phyllodes tumor This very rare type of breast tumor forms from the stroma of the breast, in contrast to carcinomas which develop in the ducts or lobules. Phyllodes (also spelled phylloides) tumors are usually benign, but are malignant on rare occasions.
  • malignant phyllodes tumors are very rare and less than 10 women per year in the US die of this disease. Benign phyllodes tumors are successfully treated by removing the mass and a narrow margin of normal breast tissue.
  • tubular carcinoma Accounting for about 2% of all breast cancers, tubular carcinomas are a special type of infiltrating breast carcinoma. They have a better prognosis than usual infiltrating ductal or lobularcarcinomas. High-quality mammography combined with clinical breast exam remains the only screening method clearly tied to reduction in breast cancer mortality. Lower dose x-rays, digitized computer rather than film images, and the use of computer programs to assist diagnosis, are almost ready for widespread dissemination. Other technologies also are being developed, including magnetic resonance imaging and ultrasound. In addition, a very low radiation exposure technique, positron emission tomography has the potential for detecting early breast cancer.
  • breast cancer can thus be generally diagnosed by detection of expression of a gene or genes associated with breast tumors. Where enough information is available about the differential gene expression between various types of breast tumor tissues, the specific type of breast tumor can also be diagnosed.
  • ER estrogen receptor
  • Malignant breast cancer is often divided into two groups, ER-positive and ER-negative, based on the estrogen receptor status of the tissue.
  • the ER status represents different survival length and response to hormone therapy, and is thought to represent either: 1 ) an indicator of different stages of the disease, or 2) an indicator that allows differentiation between two similar but distinct diseases.
  • a number of other genes are known to vary expression between either different stages of cancer or different types of similar breast cancer.
  • polynucleotides of the invention can be used in the diagnosis and management of breast cancer.
  • the differential expression of a polynucleotide in human breast tumor tissue can be used as a diagnostic marker for human breast cancer.
  • the polynucleotides of the invention that would be especially useful for this purpose are those that exhibit differential expression between breast cancer tissue with a high metastatic potential and a low metastatic potential, i.e. SEQ ID NOS: 9, 42, 52, 62, 65, 66, 68, 114, 123, 144, 172, 178, 214, 219, 223, 258, 317, and 379. Detection of breast cancer can be determined using expression levels of any of these sequences alone or in combination.
  • Determination of the aggressive nature and/or the metastatic potential of a breast cancer can also be determined by comparing levels of one or more polynucleotides of the invention and comparing levels of another sequence known to vary in cancerous tissue, e.g. ER expression.
  • development of breast cancer can be detected by examining the ratio of SEQ ID NO: to the levels of steroid hormones (e.g. , testosterone or estrogen) or to other hormones (e.g. , growth hormone, insulin).
  • steroid hormones e.g. , testosterone or estrogen
  • other hormones e.g. , growth hormone, insulin
  • Diagnosis of breast cancer can also involve comparing the expression of a polynucleotide of the invention with the expression of other sequences in non-malignant breast tissue samples in comparison to one or more forms of the diseased tissue.
  • a comparison of expression of one or more polynucleotides of the invention between the samples provides information on relative levels of these polynucleotides as well as the ratio of these polynucleotides to the expression of other sequences in the tissue of interest compared to normal.
  • This risk of breast cancer is elevated significantly by the presence of an inherited risk for breast cancer, such as a mutation in BRCA-1 or BRCA-2.
  • New diagnostic tools are being developed to address the needs of higher risk patients to complement mammography and physical examinations for early detection of breast cancer, particularly among younger women.
  • the presence of antigen or expression markers in nipple aspirate fluid (NAF) samples collected from one or both breasts can be useful for useful for risk assessment or early cancer detection.
  • NAF nipple aspirate fluid
  • the polynucleotides of the invention can be used in multivariate analysis with expression studies with genes such as p53 and EGFR as risk predictors and as surrogate endpoint biomarkers for breast cancer. As well as being used for diagnosis and risk assessment, the expression of certain genes can also correlated to prognosis of a disease state.
  • the expression of particular gene have been used as prognostic indicators for breast cancer including increased expression of c-erbB-2, pS2, ER, progesterone receptor, epidermal growth factor receptor (EGFR), neu, myc, bcl-2, int2, cytosolic tyrosine kinase, cyclin E,prad-1, hst, uPA, PAI-1, PAI-2, cathepsin D, as well as the presence of a number of cancer-specific antigens, e.g. CEA, CA M26, CA M29 and CA 15.3. Davis, Br. J. BiomedSci. (1996) 55:157.
  • a number of cancer-specific antigens e.g. CEA, CA M26, CA M29 and CA 15.3. Davis, Br. J. BiomedSci. (1996) 55:157.
  • the expression of the polynucleotides of the invention can be of prognostic value for determining the metastatic potential of a malignant breast cancer, as this molecules are differentially expressed between high and low metastatic potential tissues tumors.
  • the levels of these polynucleotides in patients with malignant breast cancer can compared to normal tissue, malignant tissue with a known high potential metastatic level, and malignant tissue with a known lower level of metastatic potential to provide a prognosis for a particular patient.
  • Such a prognosis is predictive of the extent and nature of the cancer.
  • the determined prognosis is useful in determining the prognosis of a patient with breast cancer, both for initial treatment of the disease and for longer-term monitoring of the same patient. If samples are taken from the same individual over a period of time, differences in polynucleotide expression that are specific to that patient can be identified and closely watched.
  • Colorectal cancer is one of the most common neoplasms in humans and perhaps the most frequent form of hereditary neoplasia. Prevention and early detection are key factors in controlling and curing colorectal cancer. Indeed, colorectal cancer is the second most preventable cancer, after lung cancer. Colorectal cancer begins as polyps, which are small, benign growths of cells that form on the inner lining of the colon. Over a period of several years, some of these polyps accumulate additional mutations and become cancerous. About 20 percent of all cases of colon cancer are thought to be related to heredity.
  • Familial adenomatous polyposis This condition results in a person having hundreds or even thousands of polyps in the colon and rectum that usually first appear during the teenage years. Cancer nearly always develops in one or more of these polyps between the ages of 30 and 50.
  • Gardner's syndrome Like FAP, Gardner's syndrome results in polyps and colorectal cancers that develop at a young age. It can also cause benign tumors of the skin, soft connective tissue and bones.
  • HNPCC Hereditary nonpolyposis colon cancer
  • Familial colorectal cancer in Ashkenazi Jews Recent research has found an inherited tendency to developing colorectal cancer among some Jews of Eastern European descent. Like people with FAP, Gardner's syndrome, and HNPCC, their increased risk is due to an inherited mutation present in about 6% of American Jews.
  • Colorectal cancer can thus be generally diagnosed by detection of expression of a gene or genes associated with colorectal tumors.
  • the expression of polynucleotides of the invention can be used in the diagnosis, prognosis and management of colorectal cancer.
  • the differential expression of a polynucleotide in hyperplasia can be used as a diagnostic marker for colon cancer.
  • the polynucleotides of the invention that would be especially useful for this purpose are those that exhibit differential expression between malignant metastatic colon cancer and normal patient tissue , i.e. SEQ ID NOS: 52, 119, 172, 288. Detection of malignant colon cancer can be determined using expression levels of any of these sequences alone or in combination with the levels of expression.
  • Determination of the aggressive nature and/or the metastatic potential of a colon cancer can also be determined by comparing levels of one or more polynucleotides of the invention and comparing total levels of another sequence known to vary in cancerous tissue, e.g. p53 expression.
  • development of colon cancer can be detected by examining the ratio of any of the polynucleotides of the invention to the levels of oncogenes (e.g. ras) or tumor suppressor genes (e.g. FAP or p53).
  • oncogenes e.g. ras
  • tumor suppressor genes e.g. FAP or p53
  • Polypeptides encoded by the instant polynucleotides and corresponding full length genes can be used to screen peptide libraries to identify binding partners, such as receptors, from among the encoded polypeptides.
  • a library of peptides can be synthesized following the methods disclosed in U.S. Pat.
  • a suitable peptide synthesis support e.g., a resin
  • the concentration of each amino acid in the reaction mixture is balanced or adjusted in inverse proportion to its coupling reaction rate so that the product is an equimolar mixture of amino acids coupled to the starting resin.
  • the bound amino acids are then deprotected, and reacted with another balanced amino acid mixture to form an equimolar mixture of all possible dipeptides.
  • a mixture of peptides of the desired length (e.g., hexamers) is formed.
  • the desired length e.g., hexamers
  • the mixture of peptides is screened for binding to the selected polypeptide. The peptides are then tested for their ability to inhibit or enhance activity. Peptides exhibiting the desired activity are then isolated and sequenced.
  • the method described in WO 91/17823 is similar. However, instead of reacting the synthesis resin with a mixture of activated amino acids, the resin is divided into twenty equal portions (or into a number of portions corresponding to the number of different amino acids to be added in that step), and each amino acid is coupled individually to its portion of resin. The resin portions are then combined, mixed, and again divided into a number of equal portions for reaction with the second amino acid. In this manner, each reaction can be easily driven to completion. Additionally, one can maintain separate "subpools" by treating portions in parallel, rather than combining all resins at each step. This simplifies the process of determining which peptides are responsible for any observed receptor binding or signal transduction activity.
  • the subpools containing, e.g., 1-2,000 candidates each are exposed to one or more polypeptides of the invention.
  • Each subpool that produces a positive result is then resynthesized as a group of smaller subpools (sub-subpools) containing, e.g., 20-100 candidates, and reassayed.
  • Positive sub-subpools can be resynthesized as individual compounds, and assayed finally to determine the peptides that exhibit a high binding constant.
  • These peptides can be tested for their ability to inhibit or enhance the native activity.
  • the methods described in WO 91/7823 and U.S. Patent No. 5,194,392 (herein incorporated by reference) enable the preparation of such pools and subpools by automated techniques in parallel, such that all synthesis and resynthesis can be performed in a matter of days.
  • Peptide agonists or antagonists are screened using any available method, such as signal transduction, antibody binding, receptor binding, mitogenic assays, chemotaxis assays, etc.
  • the methods described herein are presently preferred.
  • the assay conditions ideally should resemble the conditions under which the native activity is exhibited in vivo, that is, under physiologic pH, temperature, and ionic strength. Suitable agonists or antagonists will exhibit strong inhibition or enhancement of the native activity at concentrations that do not cause toxic side effects in the subject.
  • Agonists or antagonists that compete for binding to the native polypeptide can require concentrations equal to or greater than the native concentration, while inhibitors capable of binding irreversibly to the polypeptide can be added in concentrations on the order of the native concentration.
  • novel polypeptide binding partner such as a receptor, encoded by a gene or a cDNA corresponding to a polynucleotide of the invention, and at least one peptide agonist or antagonist of the novel binding partner.
  • agonists and antagonists can be used to modulate, enhance, or inhibit receptor function in cells to which the receptor is native, or in cells that possess the receptor as a result of genetic engineering.
  • novel receptor shares biologically important characteristics with a known receptor, information about agonist/antagonist binding can facilitate development of improved agonists/antagonists of the known receptor.
  • compositions and Therapeutic Uses can comprise polypeptides, antibodies, or polynucleotides of the claimed invention.
  • the pharmaceutical compositions will comprise a therapeutically effective amount of either polypeptides, antibodies, or polynucleotides of the claimed invention.
  • therapeutically effective amount refers to an amount of a therapeutic agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable therapeutic or preventative effect.
  • the effect can be detected by, for example, chemical markers or antigen levels.
  • Therapeutic effects also include reduction in physical symptoms, such as decreased body temperature.
  • the precise effective amount for a subject will depend upon the subject's size and health, the nature and extent of the condition, and the therapeutics or combination of therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount in advance. However, the effective amount for a given situation is determined by routine experimentation and is within the judgment of the clinician.
  • an effective dose will generally be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered.
  • a pharmaceutical composition can also contain a pharmaceutically acceptable carrier.
  • pharmaceutically acceptable carrier refers to a carrier for administration of a therapeutic agent, such as antibodies or a polypeptide, genes, and other therapeutic agents.
  • Suitable carriers can be large, slowly metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill in the art.
  • Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids such as acetates, propionates, malonates, benzoates, and the like.
  • mineral acid salts such as hydrochlorides, hydrobromides, phosphates, sulfates, and the like
  • organic acids such as acetates, propionates, malonates, benzoates, and the like.
  • compositions of the invention can include liquids such as water, saline, glycerol and ethanol. Auxiliary substances, such as wetting or emulsifying agents, pH buffering substances, and the like, can also be present in such vehicles.
  • the therapeutic compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection can also be prepared. Liposomes are included within the definition of a pharmaceutically acceptable carrier. Delivery Methods. Once formulated, the compositions of the invention can be
  • compositions (1) administered directly to the subject (e.g., as polynucleotide or polypeptides); (2) delivered ex vivo, to cells derived from the subject (e.g., as in ex vivo gene therapy); or (3) delivered in vitro for expression of recombinant proteins (e.g., polynucleotides).
  • Direct delivery of the compositions will generally be accomplished by injection, either subcutaneously, intraperitoneally, intravenously or intramuscularly, or delivered to the interstitial space of a tissue.
  • the compositions can also be administered into a tumor or lesion.
  • Other modes of administration include oral and pulmonary administration, suppositories, and transdermal applications, needles, and gene guns or hyposprays.
  • Dosage treatment can be a single dose schedule or a multiple dose schedule.
  • Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known in the art and described in e.g., International Publication No. WO 93/14778.
  • Examples of cells useful in ex vivo applications include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic cells, or tumor cells.
  • nucleic acids for both ex vivo and in vitro applications can be accomplished by, for example, dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well known in the art.
  • the disorder can be amenable to treatment by administration of a therapeutic agent based on the provided polynucleotide or corresponding polypeptide.
  • Neoplasias that are treated with the antisense composition include, but are not limited to, cervical cancers, melanomas, colorectal adenocarcinomas, Wilms' tumor, retinoblastoma, sarcomas, myosarcomas, lung carcinomas, leukemias, such as chronic myelogenous leukemia, promyelocytic leukemia, monocytic leukemia, and myeloid leukemia, and lymphomas, such as histiocytic lymphoma.
  • Proliferative disorders that are treated with the therapeutic composition include disorders such as anhydric hereditary ectodermal dysplasia, congenital alveolar dysplasia, epithelial dysplasia of the cervix, fibrous dysplasia of bone, and mammary dysplasia.
  • Hyperplasias for example, endometrial, adrenal, breast, prostate, or thyroid hyperplasias or pseudoepitheliomatous hyperplasia of the skin, are treated with antisense therapeutic compositions based upon a polynucleotide of the invention.
  • downregulation or inhibition of expression of a gene corresponding to a polynucleotide of the invention can have therapeutic application. For example, decreasing gene expression can help to suppress tumors in which enhanced expression of the gene is implicated.
  • the dose of the antisense composition and the means of administration are determined based on the specific qualities of the therapeutic composition, the condition, age, and weight of the patient, the progression of the disease, and other relevant factors.
  • Administration of the therapeutic antisense agents of the invention includes local or systemic administration, including injection, oral administration, particle gun or catheterized administration, and topical administration.
  • the therapeutic antisense composition contains an expression construct comprising a promoter and a polynucleotide segment of at least 12, 22, 25, 30, or 35 contiguous nucleotides of the antisense strand of a polynucleotide disclosed herein. Within the expression construct, the polynucleotide segment is located downstream from the promoter, and transcription of the polynucleotide segment initiates at the promoter.
  • a small metastatic lesion is located and the therapeutic composition injected several times in several different locations within the body of tumor.
  • arteries which serve a tumor are identified, and the therapeutic composition injected into such an artery, in order to deliver the composition directly into the tumor.
  • a tumor that has a necrotic center is aspirated and the composition injected directly into the now empty center of the tumor.
  • the antisense composition is directly administered to the surface of the tumor, for example, by topical application of the composition.
  • X-ray imaging is used to assist in certain of the above delivery methods.
  • Receptor-mediated targeted delivery of therapeutic compositions containing an antisense polynucleotide, subgenomic polynucleotides, or antibodies to specific tissues is also used.
  • Receptor-mediated DNA delivery techniques are described in, for example, Findeis et al, Trends Biotechnol. (1993) 77:202; Chiou et al, Gene Therapeutics: Methods And Applications Of Direct Gene Transfer (J.A. Wolff, ed.) (1994); Wu et al, J. Biol. Chem. (1988) 265:621; Wu et al, J. Biol. Chem. (1994) 269:542; Zenke et al, Proc. Natl. Acad. Sci.
  • compositions containing antisense subgenomic polynucleotides are administered in a range of about 100 ng to about 200 mg of DNA for local administration in a gene therapy protocol. Concentration ranges of about 500 ng to about 50 mg, about 1 ⁇ g to about 2 mg, about 5 ⁇ g to about 500 ⁇ g, and about 20 ⁇ g to about 100 ⁇ g of DNA can also be used during a gene therapy protocol.
  • Factors such as method of action and efficacy of transformation and expression are considerations which will affect the dosage required for ultimate efficacy of the antisense subgenomic polynucleotides. Where greater expression is desired over a larger area of tissue, larger amounts of antisense subgenomic polynucleotides or the same amounts readministered in a successive protocol of administrations, or several administrations to different adjacent or close tissue portions of, for example, a tumor site, may be required to effect a positive therapeutic outcome. In all cases, routine experimentation in clinical trials will determine specific ranges for optimal therapeutic effect.
  • a more complete description of gene therapy vectors, especially retroviral vectors is contained in U.S. Serial No. 08/869,309, which is expressly incorporated herein, and in section G below.
  • Therapeutic agents also include antibodies to proteins and polypeptides encoded by the polynucleotides of the invention and related genes, as described in U.S. Patent No. 5,654,173.
  • the therapeutic polynucleotides and polypeptides of the present invention can be utilized in gene delivery vehicles.
  • the gene delivery vehicle can be of viral or non- viral origin (see generally, Jolly, Cancer Gene Therapy (1994) 7:51; Kimura, Human Gene Therapy (1994) 5:845; Connelly, Human Gene Therapy (1995) 7:185; and Kaplitt, Nature Genetics (1994) 6:148).
  • Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of the invention can be administered either locally or systemically. These constructs can utilize viral or non-viral vector approaches. Expression of such coding sequences can be induced using endogenous mammalian or heterologous promoters. Expression of the coding sequence can be either constitutive or regulated.
  • the present invention can employ recombinant retroviruses which are constructed to carry or express a selected nucleic acid molecule of interest.
  • Retrovirus vectors that can be employed include those described in EP 0 415 731; WO 90/07936; WO 94/03622; WO 93/25698; WO 93/25234; U.S. Patent No. 5, 219,740; WO 93/11230; WO 93/10218; Vile and Hart, Cancer Res. (1993) 55:3860; Vile et al, Cancer Res. (1993) 55:962; Ram et al., Cancer Res. (1993) 55:83; Takamiya et al, J. Neurosci. Res.
  • Preferred recombinant retroviruses include those described in WO 91/02805.
  • Packaging cell lines suitable for use with the above-described retroviral vector constructs can be readily prepared (see, e.g., WO 95/30763 and WO 92/05266), and used to create producer cell lines (also termed vector cell lines) for the production of recombinant vector particles.
  • producer cell lines also termed vector cell lines
  • packaging cell lines are made from human (such as HT1080 cells) or mink parent cell lines, thereby allowing production of recombinant retroviruses that can survive inactivation in human serum.
  • the present invention also employs alphavirus-based vectors that can function as gene delivery vehicles.
  • alphavirus-based vectors can be constructed from a wide variety of alphaviruses, including, for example, Sindbis virus vectors, Semliki forest virus (ATCC VR-67; ATCC VR-1247), Ross River virus (ATCC VR-373; ATCC VR-1246) and Venezuelan equine encephalitis virus (ATCC VR-923; ATCC VR-1250; ATCC VR 1249; ATCC VR-532).
  • Sindbis virus vectors Semliki forest virus
  • ATCC VR-373 Ross River virus
  • ATCC VR-1246 Venezuelan equine encephalitis virus
  • Representative examples of such vector systems include those described in U.S. Patent Nos.
  • Gene delivery vehicles of the present invention can also employ parvovirus such as adeno-associated virus (AAV) vectors.
  • AAV adeno-associated virus
  • Representative examples include the AAV vectors disclosed by Srivastava in WO 93/09239, Samulski et al., J. Virol. (1989) 65:3822; Mendelson et al, Virol. (1988) 766:154; and Flotte et al, PNAS (1993) 90:10613.
  • Representative examples of adeno viral vectors include those described by Berkner,
  • adenoviral gene therapy vectors employable in this invention also include those described in WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655.
  • Administration of DNA linked to killed adenovirus as described in Curiel, Hum. Gene Ther. (1992) 5:147 can be employed.
  • Other gene delivery vehicles and methods can be employed, including polycationic condensed DNA linked or unlinked to killed adenovirus alone, for example Curiel. Hum. Gene Ther. (1992) 5:147; ligand linked DNA, for example see Wu, J. Biol.
  • Naked DNA can also be employed.
  • Exemplary naked DNA introduction methods are described in WO 90/11092 and U.S. Patent No. 5,580,859. Uptake efficiency can be improved using biodegradable latex beads.
  • DNA coated latex beads are efficiently transported into cells after endocytosis initiation by the beads. The method can be improved further by treatment of the beads to increase hydrophobicity and thereby facilitate disruption of the endosome and release of the DNA into the cytoplasm.
  • Liposomes that can act as gene delivery vehicles are described in U.S. Patent No. 5,422,120; WO 95/13796; WO 94/23697; WO 91/14445; and EP 0524968.
  • non-viral delivery suitable for use includes mechanical delivery systems such as the approach described in Woffendin et al, Proc. Natl. Acad. Sci. USA (1994)
  • the coding sequence and the product of expression of such can be delivered through deposition of photopolymerized hydrogel materials.
  • Other conventional methods for gene delivery that can be used for delivery of the coding sequence include, for example, use of hand-held gene transfer particle gun, as described in U.S. Patent No. 5,149,655; use of ionizing radiation for activating transferred gene, as described in U.S. Patent No. 5,206,152 and WO 92/11033.
  • Example 1 Source of Biological Materials and Overview of Novel Polynucleotides
  • the KML4-A is a highly metastatic subline derived from KM12C (Yeatman et al. Nucl. Acids.
  • KM12C and KM12C-derived cell lines are well- recognized in the art as a model cell line for the study of colon cancer (see, e.g., Moriakawa et al, supra; Radinsky et al. Clin. Cancer Res. (1995) 7:19; Yeatman et al, (1995) supra; Yeatman et al. Clin. Exp. Metastasis (1996) 74:246).
  • masking does not influence the final search results, except to eliminate of relative little interest due to their lox complexity, and to eliminate multiple "hits" based on similarity to repetitive regions common to multiple sequences, e.g., Alu repeats.
  • Masking resulted in the elimination of 43 sequences. The remaining sequences were then used in a BLASTN vs.
  • Genbank search with search parameters of greater than 70% overlap, 99% identity, and a p value of less than 1 x 10 "40 , which search resulted in the discarding of 1,432 sequences.
  • sequences were classified as unknown (no hits), weak similarity, and high similarity (parameters as above). Two searches were performed on these sequences.
  • a BLAST vs. EST database search resulted in discard of 1771 sequences (sequences with greater than 99% overlap, greater than 99% similarity and a p value of less than 1 x 10 " 40 ; sequences with a p value of less than 1 x 10 "65 when compared to a database sequence of human origin were also excluded).
  • a BLASTN vs. Patent GeneSeq database resulted in discard of 15 sequences (greater than 99% identity; p value less than 1 x 10 "40 ; greater than 99% overlap).
  • sequences were subjected to screening using other rules and redundancies in the dataset. Sequences with a p value of less than 1 x 10 _ ⁇ n in relation to a database sequence of human origin were specifically excluded. The final result provided the 404 sequences listed in the accompanying Sequence Listing. The Sequence Listing is arranged beginning with sequences with no similarity to any sequence in a database searched, and ending with sequences with the greatest similarity. Each identified polynucleotide represents sequence from at least a partial mRNA transcript. Polynucleotides that were determined to be novel were assigned a sequence identification number.
  • the DNA sequences corresponding to the novel polynucleotides are provided in the Sequence Listing. The majority of the sequences are presented in the Sequence Listing in the 5' to 3' direction. A small number, 25, are listed in the Sequence Listing in the 5' to 3' direction but the sequence as written is actually 3' to 5'. These sequences are readily identified with the designation "AR" in the Sequence Name in Table 1 (inserted before the claims). The sequences correctly listed in the 5' to 3' direction in the
  • Sequence Listing are designated "AF.”
  • the Sequence Listing filed herewith therefore contains 25 sequences listed in the reverse order, namely SEQ ID NOS:47, 97, 137, 171,
  • polynucleotides represent partial mRNA transcripts
  • two or more polynucleotides of the invention may represent different regions of the same mRNA transcript and the same gene.
  • SEQ ID NOS: are identified as belonging to the same clone, then either sequence can be used to obtain the full-length mRNA or gene.
  • inserts of the clones corresponding to these polynucleotides were re-sequenced. These "validation" sequences are provided in SEQ ID NOS:405-800. These validation sequences were often longer than the original polynucleotide sequences.
  • Validation sequences can be correlated with the original sequences they validate by identifying those sequences of SEQ ID NOS: 1 -404 and the validation sequences of SEQ ID NOS:405-800 that share the same clone name in Table 1.
  • SEQ ID NOS: 1-404 as well as the validation sequences SEQ ID NOS:405-800, were translated in all three reading frames to determine the best alignment with the individual sequences.
  • These amino acid sequences and nucleotide sequences are referred, generally, as query sequences, which are aligned with the individual sequences.
  • Query and individual sequences were aligned using the BLAST programs, available over the world wide web at http://ww.ncbi.nlm.nih.gov/BLAST/. Again the sequences were masked to various extents to prevent searching of repetitive sequences or poly-A sequences, using the XBLAST program for masking low complexity as described above in Example 1.
  • Table 2 (inserted before the claims) shows the results of the alignments.
  • Table 2 refers to each sequence by its SEQ ID NO:, the accession numbers and descriptions of nearest neighbors from the Genbank and Non-Redundant Protein searches, and the p values of the search results.
  • Table 1 identifies each SEQ ID NO: by SEQ name, clone ID, and cluster. As discussed above, a single cluster includes polynucleotides representing the same gene or gene family, and generally represents sequences encoding the same gene product.
  • SEQ ID NOS: 1-800 For each of SEQ ID NOS: 1-800, the best alignment to a protein or DNA sequence is included in Table 2.
  • the activity of the polypeptide encoded by SEQ ID NOS: 1-800 is the same or similar to the nearest neighbor reported in Table 2.
  • the accession number of the nearest neighbor is reported, providing a reference to the activities exhibited by the nearest neighbor.
  • the search program and database used for the alignment also are indicated as well as a calculation of the p value.
  • Full length sequences or fragments of the polynucleotide sequences of the nearest neighbors can be used as probes and primers to identify and isolate the full length sequence of SEQ ID NOS: 1-800.
  • the nearest neighbors can indicate a tissue or cell type to be used to construct a library for the full-length sequences of SEQ ID NOS: 1-800.
  • SEQ ID NOS: 1-800 and the translations thereof may be human homologs of known genes of other species or novel allelic variants of known human genes. In such cases, these new human sequences are suitable as diagnostics or therapeutics.
  • diagnostics the human sequences SEQ ID NOS: 1-800 exhibit greater specificity in detecting and differentiating human cell lines and types than homologs of other species.
  • the human polypeptides encoded by SEQ ID NOS: 1-800 are likely to be less immunogenic when administered to humans than homologs from other species. Further, on administration to humans, the polypeptides encoded by SEQ ID NOS: 1-800 can show greater specificity or can be better regulated by other human proteins than are homologs from other species.
  • polynucleotides of the invention were found to encode polypeptides having characteristics of a polypeptide belonging to a known protein families (and thus represent new members of these protein families) and/or comprising a known functional domain (Table 3).
  • the invention encompasses fragments, fusions, and variants of such polynucleotides that retain biological activity associated with the protein family and/or functional domain identified herein.
  • Start and stop indicate the position within the individual sequenes that align with the query sequence having the indicated SEQ ID NO.
  • the direction (Dir) indicates the orientation of the query sequence with respect to the individual sequence, where forward (for) indicates that the alignment is in the same direction (left to right) as the sequence provided in the Sequence Listing and reverse (rev) indicates that the alignment is with a sequence complementary to the sequence provided in the Sequence Listing.
  • SEQ ID NOS: 24, 41, 101, 157, 341, and 395 correspond to a sequence encoding a polypeptide that is a member of the 4 transmembrane segments integral membrane protein family (transmembrane 4 family).
  • the transmembrane 4 family of proteins includes a number of evolutionarily-related eukaryotic cell surface antigens (Levy et al, J. Biol.
  • the proteins belonging to this family include: 1) Mammalian antigen CD9 (MIC3), which is involved in platelet activation and aggregation; 2) Mammalian leukocyte antigen CD37, expressed on B lymphocytes; 3) Mammalian leukocyte antigen CD53 (OX-44), which is implicated in growth regulation in hematopoietic cells; 4) Mammalian lysosomal membrane protein CD63 (melanoma-associated antigen ME491 ; antigen AD1); 5) Mammalian antigen CD81 (cell surface protein TAPA-1), which is implicated in regulation of lymphoma cell growth; 6) Mammalian antigen CD82 (protein R2; antigen C33; Kangai 1 (KAI1)), which associates with CD4 or CD8 and delivers costimulatory signals for the TCR/CD3 pathway; 7) Mammalian antigen CDl 51 (SFA-1 ; platelet-endothelial tetraspan antigen 3 (PETA-3));
  • the members of the 4 transmembrane family share several characteristics. First, they all are apparently type III membrane proteins, which are integral membrane proteins containing an N-terminal membrane-anchoring domain which is not cleaved during biosynthesis and which functions both as a translocation signal and as a membrane anchor. The family members also contain three additional transmembrane regions, at least seven conserved cysteines residues, and are of approximately the same size (218 to 284 residues). These proteins are collectively know as the "transmembrane 4 superfamily" (TM4) because they span plasma membrane four times.
  • TM4 transmembrane 4 superfamily
  • TMa is the transmembrane anchor
  • TM2 to TM4 represents transmembrane regions 2 to 4
  • 'C are conserved cysteines
  • the consensus pattern spans a conserved region including two cysteines located in a short cytoplasmic loop between two transmembrane domains: Consensus pattern: G-x(3)-[LIVMF]-x(2)-[GSA]-[LIVMF](2)-G-C-x-[GA]-[STA]- x(2)- [EG]-x(2)-[CWN]-[LIVM](2).
  • SEQ ID NOS: 24, 41, 101, 157, 291, 305, 315, and 341 correspond to a sequence encoding a polypeptide that is a member of the seven transmembrane receptor family.
  • G-protein coupled receptors (Strosberg, Eur. J. Biochem. (1991) 796:1; Kerlavage, Curr. Opin. Struct. Biol. (1991) 7 :394; and Probst et al. , DNA Cell Biol. (1992) 77 : 1 ; and Savarese et al. , Biochem. J. ( 1992)
  • R7G guanine nucleotide-binding
  • G guanine nucleotide-binding
  • the tertiary structure of these receptors is thought to be highly similar. They have seven hydrophobic regions, each of which most probably spans the membrane. The N-terminus is located on the extracellular side of the membrane and is often glycosylated, while the C-terminus is cytoplasmic and generally phosphorylated. Three extracellular loops alternate with three intracellular loops to link the seven transmembrane regions. Most, but not all of these receptors, lack a signal peptide.
  • Proteins containing ankyrin repeats include ankyrin, myotropin, I-kappaB proteins, cell cycle protein cdclO, the Notch receptor (Matsuno et al, Development (1997) 124(21):4265); G9a (or BAT8) of the class III region of the major histocompatibility complex (Biochem J. 290:811-818, 1993), FABP, GABP, 53BP2, Linl2, glp-1, SW14, and SW16.
  • the functions of the ankyrin repeats are compatible with a role in protein-protein interactions (Bork,
  • the 90 kD N-terminal domain of ankyrin contains a series of 24 33-amino-acid ank repeats. (Lux et al, Nature (1990) 544:36-42, Lambert et al, PNAS USA (1990) 57:1730.)
  • the 24 ank repeats form four folded subdomains of 6 repeats each. These four repeat subdomains mediate interactions with at least 7 different families of membrane proteins.
  • Ankyrin contains two separate binding sites for anion exchanger dimers. One site utilizes repeat subdomain two (repeats 7-12) and the other requires both repeat subdomains 3 and 4
  • the repeat motifs are involved in ankyrin interaction with tubulin, spectrin, and other membrane proteins. (Lux et al, Nature (1990) 544:36.)
  • the Rel/NF-kappaB/Dorsal family of transcription factors have activity that is controlled by sequestration in the cytoplasm in association with inhibitory proteins referred to as I-kappaB.
  • I-kappaB inhibitory proteins
  • I-kappaB proteins contain 5 to 8 copies of 33 amino acid ankyrin repeats and certain NF-kappaB/rel proteins are also regulated by cis-acting ankyrin repeat containing domains including pi 05NF-kappaB which contains a series of ankyrin repeats (Diehl and Hannink, J. Virol. (1993) 67(12)1 6 ).
  • the I-kappaBs and Cactus (also containing ankyrin repeats) inhibit activators through differential interactions with the Rel- homology domain.
  • the gene family includes proto-oncogenes, thus broadly implicating I- kappaB in the control of both normal gene expression and the aberrant gene expression that makes cells cancerous.
  • both the ankyrin repeats and the carboxy- terminal domain are required for inhibiting DNA-binding activity and direct association of pp40/ I-kappaB ⁇ with rel/NF-kappaB protein.
  • the ankyrin repeats and the carboxy-terminal of pp40/I-kappaB ⁇ form a structure that associates with the rel homology domain to inhibit DNA binding activity (Inoue et al, PNAS USA (1992) 59:4333).
  • the 4 ankyrin repeats in the amino terminus of the transcription factor subunit GABP ⁇ are required for its interaction with the GABP ⁇ subunit to form a functional high affinity DNA-binding protein. These repeats can be crosslinked to DNA when GABP is bound to its target sequence. (Thompson et al, Science (1991) 253:162; LaMarco et al, Science (1991) 255:789).
  • Myotrophin a 12.5 kDa protein having a key role in the initiation of cardiac hypertrophy, comprises ankyrin repeats.
  • the ankyrin repeats are characteristic of a hairpinlike protruding tip followed by a helix-turn-helix motif.
  • the V-shaped helix-turn-helix of the repeats stack sequentially in bundles and are stabilized by compact hydrophobic cores, whereas the protruding tips are less ordered.
  • ATPases Associated with Various Cellular Activities AAA).
  • SEQ ID NOS: 63, 116, 134, 136, 151, 384, and 404 polynucleotides encoding novel members of the "ATPases Associated with diverse cellular Activities" (AAA) protein family
  • AAA protein family is composed of a large number of ATPases that share a conserved region of about 220 amino acids that contains an ATP-binding site (Froehlich et al, J. Cell Biol. (1991) 774:443; Erdmann et ⁇ /. Cell (1991) 64:499; Peters et al, EMBO J.
  • Proteins containing two AAA domains include: 1) Mammalian and drosophila NSF (N-ethylmaleimide-sensitive fusion protein) and the fungal homolog, SEC 18, which are involved in intracellular transport between the endoplasmic reticulum and Golgi, as well as between different Golgi cisternae; 2) Mammalian transitional endoplasmic reticulum
  • ATPase (previously known as p97 or VCP), which is involved in the transfer of membranes from the endoplasmic reticulum to the golgi apparatus.
  • This ATPase forms a ring-shaped homooligomer composed of six subunits.
  • the yeast homolog, CDC48 plays a role in spindle pole proliferation; 3) Yeast protein PAS1 essential for peroxisome assembly and the related protein PAS1 from Pichia pastoris; 4) Yeast protein AFG2; 5) Sulfolobus acidocaldarius protein SAV and Halobacterium salinarium cdcH, which may be part of a transduction pathway connecting light to cell division.
  • Proteins containing a single AAA domain include: 1) Escherichia coli and other bacteria ftsH (or hflB) protein.
  • FtsH is an ATP-dependent zinc metallopeptidase that degrades the heat-shock sigma-32 factor, and is an integral membrane protein with a large cytoplasmic C-terminal domain that contain both the AAA and the protease domains; 2) Yeast protein YME1, a protein important for maintaining the integrity of the mitochondrial compartment.
  • YME1 is also a zinc-dependent protease; 3) Yeast protein AFG3 (or YTA10).
  • This protein also contains an AAA domain followed by a zinc-dependent protease domain; 4) Subunits from regulatory complex of the 26S proteasome (Hilt et al, Trends Biochem. Sci. (1996) 27:96), which is involved in the ATP-dependent degradation of ubiquitinated proteins, which subunits include: a) Mammalian 4 and homologs in other higher eukaryotes, in yeast (gene YTA5) and fission yeast (gene mts2); b) Mammalian 6 (TBP7) and homologs in other higher eukaryotes and in yeast (gene YTA2); c) Mammalian subunit 7 (MSS1) and homologs in other higher eukaryotes and in yeast (gene CIM5 or YTA3); d) Mammalian subunit 8 (P45) and homologs in other higher eukaryotes and in yeast (SUG1 or CIM3 or
  • TBP1 which influences HIV gene expression by interacting with the virus tat transactivator protein, and yeast YTA1 and YTA6
  • yeast YTA1 and YTA6 e)
  • Other probable subunits include human TBP1, which influences HIV gene expression by interacting with the virus tat transactivator protein, and yeast YTA1 and YTA6; 5) Yeast protein BCS1, a mitochondrial protein essential for the expression of the Rieske iron-sulfur protein; 6) Yeast protein MSP1, a protein involved in intramitochondrial sorting of proteins; 7) Yeast protein PAS8, and the corresponding proteins PAS5 from Pichia pastoris and PAY4 from Yarrowia lipolytica; 8) Mouse protein SKD1 and its fission yeast homolog (SpAC2Gl 1.06); 9) Caenorhabditis elegans meiotic spindle formation protein mei-1; 10) Yeast protein SAPl ' 11) Yeast protein YTA
  • AAA domains in these proteins act as ATP-dependent protein clamps(Confalonieri et al. (1995) BioEssays 77:639).
  • ATP-binding 'A' and 'B' motifs which are located in the N-terminal half of this domain, there is a highly conserved region located in the central part of the domain which was used in the development of the signature pattern.
  • the consensus pattern is: [LIVMT]-x-[LIVMT]- [LIVMF]-x-[GATMC]-[ST]-[NS]-x(4)-[LIVM]- D-x-A-[LIFA]-x-R.
  • Basic Region Plus Leucine Zipper Transcription Factors are Basic Region Plus Leucine Zipper Transcription Factors.
  • SEQ ID NO:374 correspond to a polynucleotide encoding a novel member of the family of basic region plus leucine zipper transcription factors.
  • the bZIP superfamily (Hurst, Protein Prof. (1995) 2:105; and Ellenberger, Curr. Opin. Struct. Biol. (1994) 4:12) of eukaryotic DNA-binding transcription factors encompasses proteins that contain a basic region mediating sequence- specific DNA-binding followed by a leucine zipper required for dimerization.
  • Members of the family include transcription factor AP-1, which binds selectively to enhancer elements in the cis control regions of SV40 and metallothionein IIA.
  • AP- 1 also known as c-jun, is the cellular homolog of the avian sarcoma virus 17 (ASV17) oncogene v-jun.
  • jun-B and jun-D probable transcription factors that are highly similar to jun AP-1 ; the fos protein, a proto-oncogene that forms a non-covalent dimer with c-jun; the fos-related proteins fra-1, and fos B; and mammalian cAMP response element (CRE) binding proteins CREB, CREM, ATF-1, ATF- 3, ATF-4, ATF-5, ATF-6 and LRF-1.
  • CRE mammalian cAMP response element
  • SEQ ID NO: 97 corresponds to a polynucleotide encoding a polypeptide having a bromodomain region (Haynes et al., 1992, Nucleic Acids Res. 20:2693-2603, Tamkun et al, 1992, Cell 68:561-572, and Tamkun, 1995, Curr. Opin. Genet.
  • TFIID 250 Kd subunit TBP-associated factor p250
  • gene CCG1 gene CCG1
  • CBP CREB-binding protein
  • the bromodomain is thought to be involved in protein-protein interactions and may be important for the assembly or activity of multicomponent complexes involved in transcriptional activation.
  • the consensus pattern, which spans a major part of the bromodomain, is: [STANVF]-x(2)-F-x(4)-[DNS]-x(5,7)-[DENQTF]-Y-[HFY]-x(2)-
  • SEQ ID NOS: 136, 242, and 379 correspond to polynucleotides encoding a novel protein in the family of EF-hand proteins.
  • Many calcium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain known as the EF-hand (Kawasaki et al, Protein. Prof. (1995) 2:305-490). This type of domain consists of a twelve residue loop flanked on both sides by a twelve residue alpha-helical domain.
  • the calcium ion is coordinated in a pentagonal bipyramidal configuration.
  • the six residues involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X, Y, Z, -Y, -X and -Z.
  • the invariant Glu or Asp at position 12 provides two oxygens for liganding Ca (bidentate ligand).
  • the consensus pattern includes the complete EF-hand loop as well as the first residue which follows the loop and which seem to always be hydrophobic.
  • SEQ ID NO: 308 corresponds to a gene encoding a novel eukaryotic aspartyl protease.
  • Aspartyl proteases known as acid proteases, (EC 3.4.23.-) are a widely distributed family of proteolytic enzymes (Foltmann B., Essays Biochem. (1981) 77:52; Davies O.R., Annu. Rev. Biophys. Chem.
  • Aspartate proteases of eukaryotes are monomeric enzymes which consist of two domains. Each domain contains an active site centered on a catalytic aspartyl residue. The two domains most probably evolved from the duplication of an ancestral gene encoding a primordial domain.
  • eukaryotic aspartyl proteases include: 1) Vertebrate gastric pepsins A and C (also known as gastricsin); 2) Vertebrate chymosin (rennin), involved in digestion and used for making cheese; 3) Vertebrate lysosomal cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34); 4) Mammalian renin (EC 3.4.23.15) whose function is to generate angiotensin I from angiotensinogen in the plasma; 5) Fungal proteases such as aspergillopepsin A (EC 3.4.23.18), candidapepsin (EC 3.4.23.24), mucoropepsin (EC 3.4.23.23) (mucor rennin), endothiapepsin (EC 3.4.23.22), polyporopepsin (EC 3.4.23.29), and rhizopuspepsin (EC 3.4.23.21); and 6)
  • PEP4 is implicated in posttranslational regulation of vacuolar hydrolases; 7) Yeast barrierpepsin (EC 3.4.23.35) (gene BAR1); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone; and 8) Fission yeast sxal which is involved in degrading or processing the mating pheromones.
  • retroviruses and some plant viruses encode for an aspartyl protease which is an homodimer of a chain of about 95 to 125 amino acids.
  • the protease is encoded as a segment of a polyprotein which is cleaved during the maturation process of the virus. It is generally part of the pol polyprotein and, more rarely, of the gag polyprotein. Because the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active site of the viral proteases is conserved, a single signature pattern can be used to identify members of both groups of proteases.
  • GATA Family of Transcription Factors SEQ ID NO:213 corresponds to a novel member of the GATA family of transcription factors.
  • the GATA family of transcription factors are proteins that bind to DNA sites with the consensus sequence (A T)GATA(A G), found within the regulatory region of a number of genes. Proteins currently known to belong to this family are: 1) GATA-1 (Trainor, CD., et al, Nature (1990) 545:92) (also known as
  • Eryfl, GF-1 or NF-E1 which binds to the GATA region of globin genes and other genes expressed in erythroid cells. It is a transcriptional activator which probably serves as a general 'switch' factor for erythroid development; 2) GATA-2 (Lee, M.E., et al, J. Biol. Chem. (1991) 266:16188), a transcriptional activator which regulates endothelin- 1 gene expression in endothelial cells; 3) GATA-3 (Ho, I.-C, et al, EMBO J.
  • E srp gene srp
  • Emericella nidulans are (Arst, H.N., Jr., et al, Trends Genet. (1989) 5:291) a transcriptional activator which mediates nitrogen metabolite repression
  • Neurospora crassa nit-2 (Fu, Y.-H., et al, Mol. Cell. Biol.
  • GATA family The consensus pattern for the GATA family is: C-x-[DN]-C-x(4,5)-[ST]-x(2)-W- [HR]-[RK]-x(3)-[GN]-x(3,4)-C-N-[AS]-C, where the four C's are zinc ligands.
  • SEQ ID NO: 367 corresponds to a gene encoding a novel polypeptide of the G-protein alpha subunit family.
  • Guanine nucleotide binding proteins are a family of membrane-associated proteins that couple extracellularly-activated integral-membrane receptors to intracellular effectors, such as ion channels and enzymes that vary the concentration of second messenger molecules.
  • G-proteins are composed of 3 subunits (alpha, beta and gamma) which, in the resting state, associate as a trimer at the inner face of the plasma membrane.
  • the alpha subunit has a molecule of guanosine diphosphate (GDP) bound to it. Stimulation of the G-protein by an activated receptor leads to its exchange for GTP (guanosine triphosphate). This results in the separation of the alpha from the beta and gamma subunits, which always remain tightly associated as a dimer. Both the alpha and beta-gamma subunits are then able to interact with effectors, either individually or in a cooperative manner. The intrinsic GTPase activity of the alpha subunit hydrolyses the bound GTP to GDP.
  • GDP guanosine diphosphate
  • G-protein alpha subunits are 350-400 amino acids in length and have molecular weights in the range 40-45 kDa. Seventeen distinct types of alpha subunit have been identified in mammals. These fall into 4 main groups on the basis of both sequence similarity and function: alpha-s, alpha-q, alpha-i and alpha- 12 (Simon et al, Science (1993) 252:802). Many alpha subunits are substrates for ADP-ribosylation by cholera or pertussis toxins.
  • SEQ ID NO: 188 and 251 represent polynucleotides encoding a protein belonging to the family including phorbol esters/diacylglycerol binding proteins.
  • Diacylglycerol (DAG) is an important second messenger.
  • Phorbol esters (PE) are analogues of DAG and potent tumor promoters that cause a variety of physiological changes when administered to both cells and tissues. DAG activates a family of serine/threonine protein kinases, collectively known as protein kinase C
  • Phorbol esters can directly stimulate
  • PKC The N-terminal region of PKC, known as Cl, has been shown (Ono et al, Proc. Natl. Acad. Sci. USA (1989) 56:4868) to bind PE and DAG in a phospholipid and zinc-dependent fashion.
  • the C 1 region contains one or two copies (depending on the isozyme of PKC) of a cysteine-rich domain about 50 amino-acid residues long and essential for DAG/PE-binding.
  • Such a domain has also been found in, for example, the following proteins.
  • Diacylglycerol kinase EC 2.7.1.107
  • DGK Diacylglycerol kinase (EC 2.7.1.107) (DGK) (Sakane et al, Nature (1990)
  • N-chimaerin a brain specific protein which shows sequence similarities with the BCR protein at its C-terminal part and contains a single copy of the DAG/PE-binding domain at its N-terminal part. It has been shown (Ahmed et al, Biochem. J. (1990) 272:161, and Ahmed et al, Biochem. J. (1991) 250:233) to be able to bind phorbol esters.
  • the DAG/PE-binding domain binds two zinc ions; the ligands of these metal ions are probably the six cysteines and two histidines that are conserved in this domain.
  • the signature pattern completely spans the DAG/PE domain.
  • the consensus pattern is: H-x-
  • SEQ ID NOS:202, 315, 367, and 397 represent polynucleotides encoding protein kinases. Protein kinases catalyze phosphorylation of proteins in a variety of pathways, and are implicated in cancer. Eukaryotic protein kinases (Hanks S.K., et al, FASEB J. (1995) 9:576; Hunter T., Meth. Enzymol. (1991) 200:3; Hanks S.K., et al, Meth. Enzymol. (1991) 200:38; Hanks S.K., Curr. Opin. Struct.
  • Biol (1991) 7:369; Hanks S.K., et al, Science (1988) 247:42) are enzymes that belong to a very extensive family of proteins which share a conserved catalytic core common to both serine/threonine and tyrosine protein kinases.
  • the first region, which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch of residues in the vicinity of a lysine residue, which has been shown to be involved in ATP binding.
  • the second region which is located in the central part of the catalytic domain, contains a conserved aspartic acid residue which is important for the catalytic activity of the enzyme (Knighton D.R., et al, Science (1991) 255:407).
  • the protein kinase profile includes two signature patterns for this second region: one specific for serine/threonine kinases and the other for tyrosine kinases.
  • a third profile is based on the alignment in (Hanks S.K., et al, FASEB J. (1995) 9:576) and covers the entire catalytic domain.
  • the consensus patterns are as follows:
  • Consensus pattern [LIV]-G- ⁇ P ⁇ -G- ⁇ P ⁇ -[FYWMGSTNH]-[SGA]- ⁇ PW ⁇ - [LIVCAT]- ⁇ PD ⁇ -x-[GSTACLIVMFY]-x(5,18)-[LIVMFYWCSTAR]-[AIVP]- [LIVMFAGCKRj-K, where K binds ATP.
  • the majority of known protein kinases are detected by this pattern. Proteins kinases that are not detected by this consensus include viral kinases, which are quite divergent in this region and are completely missed by this pattern.
  • Consensus pattern [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-K-x(2)-N- [LIVMFYCT](3), where D is an active site residue.
  • This consensus sequence identifies most serine/threonine-specific protein kinases with only 10 exceptions. Half of the exceptions are viral kinases, while the other exceptions include Epstein-Barr virus BGLF4 and Drosophila ninaC, which have Ser and Arg, respectively, instead of the conserved Lys. These latter two protein kinases are detected by the tyrosine kinase specific pattern described below.
  • Consensus pattern [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-
  • the protein kinase profile also detects receptor guanylate cyclases and 2-5A-dependent ribonucleases. Sequence similarities between these two families and the eukaryotic protein kinase family have been noticed previously. The profile also detects Arabidopsis thaliana kinase-like protein TMKL1 which seems to have lost its catalytic activity.
  • a protein analyzed includes the two of the above protein kinase signatures, the probability of it being a protein kinase is close to 100%.
  • Eukaryotic-type protein kinases have also been found in prokaryotes such as Myxococcus xanthus (Munoz-Dorado J., et al. , Ce/7 (1991) 67:995) and Yersinia pseudotuberculosis. The patterns shown above has been updated since their publication in (Bairoch A., et al, Nature (1988) 331:22). m) Protein Phosphatase 2C.
  • SEQ ID NO:256 corresponds to a polynucleotide encoding a novel protein phosphatase 2C (PP2C), which is one of the four major classes of mammalian serine/threonine specific protein phosphatases.
  • P2C protein phosphatase 2C
  • PP2C is a monomeric enzyme of about 42 Kd which shows broad substrate specificity and is dependent on divalent cations (mainly manganese and magnesium) for its activity.
  • Three isozymes are currently known in mammals: PP2C-alpha, -beta and -gamma, n) Protein Tyrosine Phosphatase.
  • SEQ ID NO:382 represents a polynucleotide encoding a protein tyrosine kinase.
  • Tyrosine specific protein phosphatases (EC 3.1.3.48) (PTPase) (Fischer et al, Science (1991) 255:401; Charbonneau et al, Annu. Rev. Cell Biol. (1992) 5:463; Trowbridge, J. Biol. Chem. (1991) 266:23517; Tonks et al, Trends Biochem. Sci. (1989) 74:497; and Hunter, Cell (1989) 55:1013) catalyze the removal of a phosphate group attached to a tyrosine residue.
  • PTPase enzymes that are very important in the control of cell growth, proliferation, differentiation and transformation.
  • Multiple forms of PTPase have been characterized and can be classified into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s).
  • Soluble PTPases include PTPN3 (HI) and PTPN4 (MEG), enzymes that contain an N-terminal band 4.1 -like domain and could act at junctions between the membrane and cytoskeleton; PTPN6 (PTP-1C; HCP; SHP) and PTPN11 (PTP-2C; SH-PTP3; Syp), enzymes that contain two copies of the SH2 domain at its N-terminal extremity.
  • Dual specificity PTPases include DUSP1 (PTPN10; MAP kinase phosphatase- 1; MKP-1) which dephosphorylates MAP kinase on both Thr- 183 and Tyr- 185; and DUSP2 (PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr residues.
  • receptor PTPases are made up of a variable length extracellular domain, followed by a transmembrane region and a C-terminal catalytic cytoplasmic domain.
  • Some of the receptor PTPases contain fibronectin type III (FN-III) repeats, immunoglobulin-like domains, MAM domains or carbonic anhydrase-like domains in their extracellular region.
  • the cytoplasmic region generally contains two copies of the
  • PTPAse domain The first seems to have enzymatic activity, while the second is inactive but seems to affect substrate specificity of the first. In these domains, the catalytic cysteine is generally conserved but some other, presumably important, residues are not.
  • PTPase domains consist of about 300 amino acids. There are two conserved cysteines and the second one has been shown to be absolutely required for activity.
  • SH3 Domain is a small protein domain of about 60 amino acid residues first identified as a conserved sequence in the non-catalytic part of several cytoplasmic protein tyrosine kinases (e.g.
  • the SH3 domain has a characteristic fold that consists of five or six beta-strands arranged as two tightly packed anti-parallel beta sheets.
  • the linker regions may contain short helices (Kuriyan et al, Curr. Opin. Struct. Biol. (1993) 5:828). It is believed that SH3 domain-containing proteins mediate assembly of specific protein complexes via binding to proline-rich peptides (Morton et al, Curr. Biol. (1994) 4:615).
  • SH3 domains are found as single copies in a given protein, but there is a significant number of proteins with two SH3 domains and a few with 3 or 4 copies.
  • SH3 domains have been identified in, for example, protein tyrosine kinases.
  • protein tyrosine kinases such as the Src, Abl, Bkt, Csk and ZAP70 families of kinases; mammalian phosphatidylinositol- specific phospholipase C-gamma-1 and -2; mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit; mammalian Ras GTPase-activating protein (GAP); mammalian Vav oncoprotein, a guanine nucleotide exchange factor of the CDC24 family; Drosophila lethal(l)discs large-1 tumor suppressor protein (gene Dlgl); mammalian tight junction protein ZO-1 ; vertebrate erythrocyte membrane protein p55; Caenorhabditis elegans protein lin-2; rat protein CASK; and mammalian synaptic
  • SEQ ID NO: 169 corresponds to a novel serine protease of the trypsin family.
  • the catalytic activity of the serine proteases from the trypsin family is provided by a charge relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which itself is hydrogen-bonded to a serine.
  • the sequences in the vicinity of the active site serine and histidine residues are well conserved in this family of proteases (Brenner S., Nature
  • Proteases known to belong to the trypsin family include: 1) Acrosin; 2)
  • Cathepsin G 4) Chymotrypsins; 5) Complement components Clr, Cis, C2, and complement factors B, D and I; 6) Complement-activating component of RA-reactive factor; 7) Cytotoxic cell proteases (granzymes A to H); 8) Duodenase I; 9) Elastases 1, 2, 3 A, 3B (protease E), leukocyte (medullasin).; 10) Enterokinase (EC 3.4.21.9) (enteropeptidase); 11) Hepatocyte growth factor activator; 12) Hepsin; 13) Glandular (tissue) kallikreins (including EGF- binding protein types A, B, and C, NGF-gamma chain, gamma-renin, prostate specific antigen (PSA) and tonin); 14) Plasma kallikrein; 15) Mast cell proteases (MCP) 1 (chymase) to 8; 16) Myeloblastin (proteinase 3) (
  • the consensus patterns for this trypsin protein family are: 1) [LIVM]-[ST]-A- [STAG]-H-C, where H is the active site residue. All sequences known to belong to this class detected by the pattern, except for complement components Clr and Cis, pig plasminogen, bovine protein C, rodent urokinase, ancrod, gyroxin and two insect trypsins; 2) [DNSTAGC]-[GSTAPIMVQH]-x(2)-G-[DE]-S-G-[GS]-[SAPHV]- [LIVMFYWF ⁇ ]- [LIVMFYSTANQH], where S is the active site residue.
  • Beta-transducin is one of the three subunits (alpha, beta, and gamma) of the guanine nucleotide-binding proteins (G proteins) which act as intermediaries in the transduction of signals generated by transmembrane receptors (Gilman, Annu. Rev. Biochem. (1987) 56:615).
  • the alpha subunit binds to and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but they seem to be required for the replacement of GDP by GTP as well as for membrane anchoring and receptor recognition.
  • G-beta exists as a small multigene family of highly conserved proteins of about 340 amino acid residues. Structurally, G-beta consists of eight tandem repeats of about 40 residues, each containing a central Trp-Asp motif (this type of repeat is sometimes called a WD-40 repeat).
  • Such a repetitive segment has been shown to exist in a number of other proteins including: human LIS1, a neuronal protein involved in type-1 lissencephaly; and mammalian coatomer beta' subunit (beta'-COP), a component of a cytosolic protein complex that reversibly associates with Golgi membranes to form vesicles that mediate biosynthetic protein transport.
  • human LIS1 a neuronal protein involved in type-1 lissencephaly
  • beta'-COP mammalian coatomer beta' subunit
  • wnt-3, -3 A, -4, -5A, -5B, -6, -7 A, -7B, -8, -8B, -9 and -10 At least four members of this family are present in Drosophila; one of them, wingless (wg), is implicated in segmentation polarity. All these proteins share the following features characteristics of secretory proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines that are probably involved in disulfide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters. The consensus pattern, which is based upon a highly conserved region including three cysteines, is as follows: C-K-C-H-G-[LIVMT]-S-G-x-C.
  • SEQ ID NOS: 188, 379 , and 395 represent polynucleotides encoding a polypeptide in the family of WW/rsp5/WWP domain- containing proteins.
  • the WW domain (Bork et al, Trends Biochem. Sci. (1994) 79:531 ;
  • Proteins containing the WW domain include:
  • Dystrophin a multidomain cytoskeletal protein. Its longest alternatively spliced form consists of an N-terminal actin-binding domain, followed by 24 spectrin-like repeats, a cysteine-rich calcium-binding domain and a C-terminal globular domain. Dystrophins form tetramers and is thought to have multiple functions including involvement in membrane stability, transduction of contractile forces to the extracellular environment and organization of membrane specialization. Mutations in the dystrophin gene lead to muscular dystrophy of Ducherme or Becker type. Dystrophin contains one WW domain C-terminal of the spectrin-repeats.
  • Vertebrate YAP protein which is a substrate of an unknown serine kinase. It binds to the SH3 domain of the Yes oncoprotein via a proline-rich region. This protein appears in alternatively spliced isoforms, containing either one or two WW domains.
  • IQGAP which is a human GTPase activating protein acting on ras. It contains an N-terminal domain similar to fly muscle mp20 protein and a C-terminal ras
  • the profile spans the whole homology region as well as a pattern.
  • the consensus for this family is: W-x(9,l 1)-[VFY]-[FYW]- x(6,7)-[GSTNE]-[GSTQCR]-[FYW]-x(2)-P.
  • SEQ ID NO:61, 306, and 386 correspond to polynucleotides encoding novel members of the of the C2H2 type zinc finger protein family.
  • Zinc finger domains (Klug et al, Trends Biochem. Sci. (1987) 72:464; Evans et al, Cell
  • nucleic acid-binding protein structures first identified in the Xenopus transcription factor TFIIIA. These domains have since been found in numerous nucleic acid-binding proteins.
  • a zinc finger domain is composed of 25 to 30 amino acid residues. Two cysteine or histidine residues are positioned at both extremities of the domain, which are involved in the tetrahedral coordination of a zinc atom. It has been proposed that such a domain interacts with about five nucleotides.
  • C2H2 the first pair of zinc coordinating residues are cysteines, while the second pair are histidines.
  • Mammalian proteins having a C2H2 zipper include (number in parenthesis indicates number of zinc finger regions in the protein): basonuclin (6), BCL-6/LAZ-3 (6), erythroid krueppel-like transcription factor (3), transcription factors Spl (3), Sp2 (3), Sp3 (3) and Sp(4) 3, transcriptional repressor YY1 (4), Wilms' tumor protein (4), EGRl/Krox24 (3), EGR2/Krox20 (3), EGR3/Pilot (3), EGR4/AT133 (4), Evi-1 (10), GLI1 (5), GLI2 (4+), GLI3 (3+), HIV-EP1/ZNF40 (4), HIV-EP2 (2), KR1 (9+), KR2 (9), KR3 (15+), KR4 (14+), KR5 (11+), HF.12 (6+), REX-1 (4), ZfX (13), ZfY (13), Zfp-35 (18), ZNF7 (15), ZNF8 (7), ZNF
  • C2H2 zinc fingers The consensus pattern for C2H2 zinc fingers is: C-x(2,4)-C-x(3)-[LIVMFYWC]- x(8)-H-x(3,5)-H.
  • the two C's and two H's are zinc ligands.
  • SEQ ID NO:322 corresponds to a polynucleotide encoding a novel member of the zinc finger CCHC family.
  • the CCHC zinc finger protein family to date has been mostly composed of retroviral gag proteins (nucleocapsid).
  • the prototype structure of this family is from HIV.
  • the family also contains members involved in eukaryotic gene regulation, such as C. elegans GLH-1.
  • Zinc-Binding Metalloprotease Domain SEQ ID NO: 306 and 395 represent polynucleotides encoding novel members of the zinc-binding metalloprotease domain protein family.
  • the majority of zinc-dependent metallopeptidases (with the notable exception of the carboxypeptidases) share a common pattern of primary structure (Jongeneel et al. , FEBS Lett. ( 1989) 242:211 ; Murphy et al. , FEBS Lett. ( 1991 ) 259:4; and Bode et al.
  • ACE Angiotensin-converting enzyme
  • ACE dipeptidyl carboxypeptidase I
  • matrixins Mammalian extracellular matrix metalloproteinases
  • MMP-1 (EC 3.4.24.7) (interstitial collagenase), MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 3.4.24.35) (92 Kd gelatinase), MMP-7 (EC 3.4.24.23) (matrylisin), MMP-8 (EC 3.4.24.34) (neutrophil collagenase), MMP-3 (EC 3.4.24.17) (stromelysin-1), MMP-10 (EC 3.4.24.22) (stromelysin-2), and MMP-11 (stromelysin-3), MMP-12 (EC 3.4.24.65) (macrophage metalloelastase). 3) Endothelin-converting enzyme 1 (EC 3.4.24.71) (ECE-1), which processes the precursor of endothelin to release the active peptide.
  • a signature pattern which includes the two histidine and the glutamic acid residues is sufficient to detect this superfamily of proteins, having the consensus pattern: [GSTALIVN]- x(2)-H-E-[LIVMFYW]- ⁇ DEHRKP ⁇ -H-x-[LIVMFYWGSPQ].
  • the two H's are zinc ligands, and E is the active site residue.
  • Example 4 Differential Expression of Polynucleotides of the Invention : Description of Libraries and Detection of Differential Expression The relative expression levels of the polynucleotides of the invention was assessed in several libraries prepared from various sources, including cell lines and patient tissue samples. Table 4 provides a summary of these libraries, including the shortened library name (used hereafter), the mRNA source used to prepared the cDNA library, the "nickname" of the library that is used in the tables below (in quotes), and the approximate number of clones in the library. Table 4 Description of cDNA Libraries
  • MB-231 cell line was originally isolated from pleural effusions (Cailleau, J. Natl. Cancer. Inst. (1974) 55:661), is of high metastatic potential, and forms poorly differentiated adenocarcinoma grade II in nude mice consistent with breast carcinoma.
  • the MCF7 cell line was derived from a pleural effusion of a breast adenocarcinoma and is non-metastatic.
  • the MV-522 cell line is derived from a human lung carcinoma and is of high metastatic potential.
  • the UCP-3 cell line is a low metastatic human lung carcinoma cell line; the MV- 522 is a high metastatic variant of UCP-3.
  • Each of the libraries is composed of a collection of cDNA clones that in turn are representative of the mRNAs expressed in the indicated mRNA source.
  • the sequences were assigned to clusters.
  • the concept of "cluster of clones" is derived from a sorting/grouping of cDNA clones based on their hybridization pattern to a panel of roughly 300 7bp oligonucleotide probes (see Drmanac et al, Genomics (1996) 7(1):29). Random cDNA clones from a tissue library are hybridized at moderate stringency to 300 7bp oligonucleotides.
  • Each oligonucleotide has some measure of specific hybridization to that specific clone.
  • the combination of 300 of these measures of hybridization for 300 probes equals the "hybridization signature" for a specific clone.
  • Clones with similar sequence will have similar hybridization signatures.
  • groups of clones in a library can be identified and brought together computationally. These groups of clones are termed "clusters".
  • the "purity" of each cluster can be controlled.
  • artifacts of clustering may occur in computational clustering just as artifacts can occur in "wet-lab” screening of a cDNA library with 400 bp cDNA fragments, at even the highest stringency.
  • the stringency used in the implementation of cluster herein provides groups of clones that are in general from the same cDNA or closely related cDNAs. Closely related clones can be a result of different length clones of the same cDNA, closely related clones from highly related gene families, or splice variants of the same cDNA.
  • Differential expression for a selected cluster was assessed by first determining the number of cDNA clones corresponding to the selected cluster in the first library (Clones in 1 st ), and the determining the number of cDNA clones corresponding to the selected cluster in the second library (Clones in 2 nd ). Differential expression of the selected cluster in the first library relative to the second library is expressed as a "ratio" of percent expression between the two libraries.
  • the "ratio" is calculated by: 1) calculating the percent expression of the selected cluster in the first library by dividing the number of clones corresponding to a selected cluster in the first library by the total number of clones analyzed from the first library; 2) calculating the percent expression of the selected cluster in the second library by dividing the number of clones corresponding to a selected cluster in a second library by the total number of clones analyzed from the second library; 3) dividing the calculated percent expression from the first library by the calculated percent expression from the second library. If the "number of clones" corresponding to a selected cluster in a library is zero, the value is set at 1 to aid in calculation. The formula used in calculating the ratio takes into account the "depth" of each of the libraries being compared, i.e., the total number of clones analyzed in each library.
  • a polynucleotide is said to be significantly differentially expressed between two samples when the ratio value is greater than at least about 2, preferably greater than at least about 3, more preferably greater than at least about 5 , where the ratio value is calculated using the method described above.
  • the significance of differential expression is determined using a z score test (Zar, Biostatistical Analysis. Prentice Hall, Inc., USA, "Differences between Proportions," pp 296-298 (1974).
  • Tables 5 to 7 show the number of clones in each of the above libraries that were analyzed for differential expression. Examples of differentially expressed polynucleotides of particular interest are described in more detail below.
  • Example 5 Polynucleotides Differentially Expressed in High Metastatic Potential Breast
  • a number of polynucleotide sequences have been identified that are differentially expressed between cells derived from high metastatic potential breast cancer tissue and low metastatic breast cancer cells. Expression of these sequences in breast cancer can be valuable in determining diagnostic, prognostic and/or treatment information. For example, sequences that are highly expressed in the high metastatic potential cells can be indicative of increased expression of genes or regulatory sequences involved in the metastatic process. A patient sample displaying an increased level of one or more of these polynucleotides may thus warrant more aggressive treatment.
  • sequences that display higher expression in the low metastatic potential cells can be associated with genes or regulatory sequences that inhibit metastasis, and thus the expression of these polynucleotides in a sample may warrant a more positive prognosis than the gross pathology would suggest.
  • the differential expression of these polynucleotides can be used as a diagnostic marker, a prognostic marker, for risk assessment, patient treatment and the like.
  • These polynucleotide sequences can also be used in combination with other known molecular and/or biochemical markers.
  • the following table summarizes identified polynucleotides with differential expression between high metastatic potential breast cancer cells and low metastatic potential breast cancer cells.
  • a number of polynucleotide sequences have been identified that are differentially expressed between cells derived from high metastatic potential lung cancer tissue and low metastatic lung cancer cells. Expression of these sequences in lung cancer tissue can be valuable in determining diagnostic, prognostic and/or treatment information. For example, sequences that are highly expressed in the high metastatic potential cells are associated can be indicative of increased expression of genes or regulatory sequences involved in the metastatic process. A patient sample displaying an increased level of one or more of these polynucleotides may thus warrant more aggressive treatment.
  • sequences that display higher expression in the low metastatic potential cells can be associated with genes or regulatory sequences that inhibit metastasis, and thus the expression of these polynucleotides in a sample may warrant a more positive prognosis than the gross pathology would suggest.
  • differential expression of these polynucleotides can be used as a diagnostic marker, a prognostic marker, for risk assessment, patient treatment and the like.
  • These polynucleotide sequences can also be used in combination with other known molecular and/or biochemical markers.
  • a number of polynucleotide sequences have been identified that are differentially expressed between cells derived from high metastatic potential colon cancer tissue and low metastatic colon cancer cells. Expression of these sequences in colon cancer tissue can be valuable in determining diagnostic, prognostic and/or treatment information. For example, sequences that are highly expressed in the high metastatic potential cells can be indicative of increased expression of genes or regulatory sequences involved in the metastatic process. A patient sample displaying an increased level of one or more of these polynucleotides may thus warrant more aggressive treatment.
  • sequences that display higher expression in the low metastatic potential cells can be associated with genes or regulatory sequences that inhibit metastasis, and thus the expression of these polynucleotides in a sample may warrant a more positive prognosis than the gross pathology would suggest.
  • the differential expression of these polynucleotides can be used as a diagnostic marker, a prognostic marker, for risk assessment, patient treatment and the like.
  • These polynucleotide sequences can also be used in combination with other known molecular and/or biochemical markers.
  • Table 11 Differentially expressed polynucleotides: High metastatic potential colon cancer vs. low metastatic colon cancer cells
  • Example 8 Polynucleotides Differentially Expressed at Higher Levels in High Metastatic
  • a number of polynucleotide sequences have been identified that are differentially expressed between cells derived from high metastatic potential colon cancer tissue and normal tissue. Expression of these sequences in colon cancer tissue can be valuable in determining diagnostic, prognostic and/or treatment information. For example, sequences that are highly expressed in the high metastatic potential cells are associated can be indicative of increased expression of genes or regulatory sequences involved in the advanced disease state which involves processes such as angiogenesis, dedifferentiation, cell replication, and metastasis. A patient sample displaying an increased level of one or more of these polynucleotides may thus warrant more aggressive treatment.
  • the differential expression of these polynucleotides can be used as a diagnostic marker, a prognostic marker, for risk assessment, patient treatment and the like. These polynucleotide sequences can also be used in combination with other known molecular and/or biochemical markers.
  • Table 11 Differentially expressed polynucleotides: High metastatic potential colon tissue vs. normal colon tissue
  • Example 9 Polynucleotides Differentially Expressed at Higher Levels in High Colon Tumor Potential Patient Tissue Versus Metastasized Colon Cancer Patient Tissue A number of polynucleotide sequences have been identified that are differentially expressed between cells derived from high tumor potential colon cancer tissue and cells derived from high metastatic potential colon cancer cells. Expression of these sequences in colon cancer tissue can be valuable in determining diagnostic, prognostic and/or treatment information associated with the transformation of precancerous tissue to malignant tissue. This information can be useful in the prevention of achieving the advanced malignant state in these tissues, and can be important in risk assessment for a patient.
  • Table 12 Differentially expressed polynucleotides: High tumor potential colon tissue vs. metastatic colon tissue
  • Example 10 Polynucleotides Differentially Expressed at Higher Levels in High Tumor Potential Colon Cancer Patient Tissue Versus Normal Patient Tissue
  • a number of polynucleotide sequences have been identified that are differentially expressed between cells derived from high tumor potential colon cancer tissue and normal tissue. Expression of these sequences in colon cancer tissue can be valuable in determining diagnostic, prognostic and/or treatment information associated with the prevention of achieving the malignant state in these tissues, and can be important in risk assessment for a patient.
  • sequences that are highly expressed in the potential colon cancer cells are associated with or can be indicative of increased expression of genes or regulatory sequences involved in early tumor progression.
  • a patient sample displaying an increased level of one or more of these polynucleotides may thus warrant closer attention or more frequent screening procedures to catch the malignant state as early as possible.
  • Table 13 Differentially expressed polynucleotides: High tumor potential colon tissue vs. normal colon tissue
  • polynucleotide sequences have been identified that are differentially expressed between cancerous cells and normal cells across all three tissue types tested (i.e. , breast, colon, and lung). Expression of these sequences in a tissue or any origin can be valuable in determining diagnostic, prognostic and/or treatment information associated with the prevention of achieving the malignant state in these tissues, and can be important in risk assessment for a patient. These polynucleotides can also serve as non-tissue specific markers of, for example, risk of metastasis of a tumor. The following table summarizes identified polynucleotides that were differentially expressed but without tissue type- specificity in the breast, colon, and lung libraries tested.
  • Tissue of UC#3 (Lib 19 > Lib20) High Colon Tumor Tissue > Normal Tissue 14 1 12.25050 ofUC#3 (Libl9 > Libl8) High Lung > Low Lung (Lib8 > Lib9) 8 1355 122 15.521 1 1
  • the cDNA libraries described herein were also analyzed to identify those polynucleotides that were specifically expressed in colon cells or tissue, i.e., the polynucleotides were identified in libraries prepared from colon cell lines or tissue, but not in libraries of breast or lung origin.
  • the polynucleotides that were expressed in a colon cell line and/or in colon tissue, but were present in the breast or lung cDNA libraries described herein, are shown in Table 15.
  • SEQ ID NOS:159 and 161 were each present in one clone in each of Lib 16 (Normal Colon Tumor Tissue), and SEQ ID NOS:344 and 345 were each present in one clone in Lib 17 (High Colon Metastasis Tissue).
  • No clones corresponding to the colon-specific polynucleotides in the table above were present in any of Libraries 3, 4, 8, or 9.
  • the polynucleotide provided above can be used as markers of cells of colon origin, and find particular use in reference arrays, as described above.
  • novel polynucleotides were used to screen publicly available and proprietary databases to determine if any of the polynucleotides of SEQ ID NOS: 1-404 would facilitate identification of a contiguous sequence, e.g., the polynucleotides would provide sequence that would result in 5' extension of another DNA sequence, resulting in production of a longer contiguous sequence composed of the provided polynucleotide and the other DNA sequence(s).
  • Contiging was performed using the AssemblyLign program with the following parameters: 1) Overlap: Minimum Overlap Length: 30; % Stringency: 50; Minimum
  • contiged sequences are provided as SEQ ID NOS:801-844.
  • the contiged sequences can be correlated with the sequences of SEQ ID NOS: 1-404 upon which the contiged sequences are based by identifying those sequences of SEQ ID NOS: 1-404 and the contiged sequences of SEQ ID NOS: 801-844 that share the same clone name in Table 1.
  • the contiged sequences (SEQ ID NO:801-844) thus represent longer sequences that encompass a polynucleotide sequence of the invention.
  • the contiged sequences were then translated in all three reading frames to determine the best alignment with individual sequences using the BLAST programs as described above for SEQ ID NOS: 1-404 and the validation sequences SEQ ID NOS:405-800. Again the sequences were masked using the XBLAST profram for masking low complexity as described above in Example 1 (Table 2).
  • Several of the contiged sequences were found to encode polypeptides having characteristics of a polypeptide belonging to a known protein families (and thus represent new members of these protein families) and/or comprising a known functional domain (Table 16).
  • the invention encompasses fragments, fusions, and variants of such polynucleotides that retain biological activity associated with the protein family and/or functional domain identified herein. Table 16. Profile hits using contiged sequences
  • AAA ATPases
  • protein kinase families The profiles for the ATPases (AAA) and protein kinase families are described above in Example 2.
  • the homeobox and MAP kinase kinase protein families are described further below.
  • the 'homeobox' is a protein domain of 60 amino acids (Gehring In: Guidebook to the Homeobox Genes. Duboule D., Ed., ppl-10, Oxford University Press, Oxford, (1994); Buerglin In: Guidebook to the Homeobox Genes, pp25-72, Oxford
  • a schematic representation of the homeobox domain is shown below.
  • the helix- turn-helix region is shown by the symbols 'H' (for helix), and 't' (for turn).
  • the pattern detects homeobox sequences 24 residues long and spans positions 34 to 57 of the homeobox domain.
  • the consensus pattern is as follows: [LIVMFYG]-[ASLVR]-x(2)- [LIVMSTACN]-x-[LIVM]-x(4)-[LIV]-[RKNQESTAIY]-[LIVFSTNKH]-W-[FYVC]-x- [NDQTAH]-x(5)-[RKNAIMW] .
  • MAP kinases are involved in signal transduction, and are important in cell cycle and cell growth controls.
  • the MAP kinase kinases (MAPKK) are dual-specificity protein kinases which phosphorylate and activate MAP kinases.
  • MAPKK homologues have been found in yeast, invertebrates, amphibians, and mammals.
  • the MAPKK/MAPK phosphorylation switch constitutes a basic module activated in distinct pathways in yeast and in vertebrates.
  • MAPKK regulation studies have led to the discovery of at least four MAPKK convergent pathways in higher organisms. One of these is similar to the yeast pheromone response pathway which includes the stel 1 protein kinase.
  • MAPKKs are apparently essential transducers through which signals must pass before reaching the nucleus.
  • CMCC Chiron Master Culture Collection

Abstract

Cette invention porte sur de nouveaux polynucléotides humains et des variantes de ceux-ci, sur leurs polypeptides codés et les variantes de ceux-ci, sur des gènes correspondant à ces polynucléotides et sur des protéines exprimées par ces gènes. L'invention porte également sur des agents diagnostiques et thérapeutiques utilisant ces nouveaux polynucléotides humains, sur leurs gènes ou produits géniques correspondants tels que ces gènes et protéines, y compris des sondes, des produits de recombinaison antisens et des anticorps.
PCT/US1998/027610 1997-12-23 1998-12-22 Genes humains et produits d'expression genique i WO1999033982A2 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2000526638A JP2002500010A (ja) 1997-12-23 1998-12-22 ヒト遺伝子および遺伝子発現産物i
EP98965500A EP1190058A2 (fr) 1997-12-23 1998-12-25 Genes humains et produits 1 gene expression
AU20955/99A AU2095599A (en) 1997-12-23 1998-12-25 Human genes and gene expression products i

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US6875597P 1997-12-23 1997-12-23
US60/068,755 1997-12-23
US8066498P 1998-04-03 1998-04-03
US60/080,664 1998-04-03
US10523498P 1998-10-21 1998-10-21
US60/105,234 1998-10-21
US10587798P 1998-10-27 1998-10-27
US60/105,877 1998-10-27
US21747198A 1998-12-21 1998-12-21
US09/217,471 1998-12-21

Publications (2)

Publication Number Publication Date
WO1999033982A2 true WO1999033982A2 (fr) 1999-07-08
WO1999033982A3 WO1999033982A3 (fr) 1999-12-23

Family

ID=27535805

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/027610 WO1999033982A2 (fr) 1997-12-23 1998-12-22 Genes humains et produits d'expression genique i

Country Status (4)

Country Link
EP (1) EP1190058A2 (fr)
JP (1) JP2002500010A (fr)
AU (1) AU2095599A (fr)
WO (1) WO1999033982A2 (fr)

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000000611A2 (fr) * 1998-06-30 2000-01-06 Millennium Pharmaceuticals, Inc. Recepteur 14273, recepteur couple a une proteine g
WO2000018916A2 (fr) * 1998-09-28 2000-04-06 Chiron Corporation Genes humains et produits d'expression genique
WO2000044900A2 (fr) * 1999-01-29 2000-08-03 Incyte Pharmaceuticals, Inc. Proteines liant des acides nucleiques
WO2000050596A2 (fr) * 1999-02-26 2000-08-31 Millenium Pharmaceuticals, Inc. Recepteur 14273, un nouveau recepteur couple a la proteine g
WO2000050443A2 (fr) 1999-02-26 2000-08-31 Millennium Pharmaceutcals, Inc. Proteines secretees et leurs utilisations
EP1043676A2 (fr) * 1999-04-09 2000-10-11 Whitehead Institute For Biomedical Research Procédé pour la classification des échantillons et la détermination des classes non préalablement connues
WO2000065067A2 (fr) * 1999-04-23 2000-11-02 University Of Washington Polynucleotides, polypeptides specifiques a la prostate, et leurs procedes d'utilisation
WO2001009318A1 (fr) * 1999-07-29 2001-02-08 Helix Research Institute Genes associes au cancer du foie
WO2001046252A1 (fr) * 1999-12-22 2001-06-28 Biowindow Gene Development Inc. Shanghai Nouveau polypeptide, sous-unite 9 de diacylglycerol proteine kinase humaine, et polynucleotide codant pour ce polypeptide
WO2001049879A2 (fr) * 1999-12-29 2001-07-12 Aros Applied Biotechnology Aps Expression genique et etats biologiques
WO2001057058A2 (fr) * 2000-01-31 2001-08-09 Metagen Pharmaceuticals Gmbh Detection d'expression genique differentielle
WO2001059115A2 (fr) * 2000-02-09 2001-08-16 Agensys, Inc. Le 83p5g4: proteine specifique de tissus fortement exprime dans le cancer de la prostate
WO2001072781A2 (fr) * 2000-03-28 2001-10-04 Chiron Corporation Genes humains et produits d'expression genique xvi
US6395877B1 (en) 1998-06-30 2002-05-28 Millennium Pharmaceuticals, Inc. 14273 receptor, a novel G-protein coupled receptor
WO2002083727A2 (fr) * 2001-04-12 2002-10-24 Universität Bremen Sequences d'acides nucleiques relatives a des hyperplasies et tumeurs de la glande thyroide
US6511834B1 (en) 2000-03-24 2003-01-28 Millennium Pharmaceuticals, Inc. 32142,21481,25964,21686, novel human dehydrogenase molecules and uses therefor
EP1326985A1 (fr) * 2000-08-18 2003-07-16 MERCK PATENT GmbH Identification d'un gene d'acetyltransferase n-terminale humaine
US6617438B1 (en) * 1997-11-05 2003-09-09 Sirna Therapeutics, Inc. Oligoribonucleotides with enzymatic activity
WO2003074701A1 (fr) * 2002-03-07 2003-09-12 Bf Research Institute, Inc. Gene p18a$g(b)rp et proteine p18a$g(b)rp, nouveau gene/nouvelle proteine (p60trp) qui interagissent avec ce gene et cette proteine afin d'inhiber la mort cellulaire et le promoteur de la mort cellulaire
US6627423B2 (en) 2000-03-24 2003-09-30 Millennium Pharmaceuticals, Inc. 21481, a novel dehydrogenase molecule and uses therefor
US6733990B1 (en) 1999-08-03 2004-05-11 Millennium Pharmaceuticals, Inc. Nucleic acid encoding 15571, a GPCR-like molecule of the secretin-like family
EP1426442A1 (fr) * 2002-12-02 2004-06-09 MTM Laboratories AG Marqueurs associes au lesions colorectales
EP1471154A2 (fr) * 2003-04-24 2004-10-27 Veridex LLC Pronostic du cancer du sein
EP1399174A4 (fr) * 2001-03-05 2004-11-17 Agensys Inc Acide nucleique et proteine correspondante, appelee 121p1f1, utilisee dans le traitement et le depistage du cancer
JP2005507631A (ja) * 2001-08-16 2005-03-24 メルク パテント ゲゼルシャフト ミット ベシュレンクテル ハフトング 新規ヒトn末端アセチルトランスフェラーゼの同定
US7083793B2 (en) 1999-02-26 2006-08-01 Millennium Pharmaceuticals, Inc. Tango 243 polypeptides and uses thereof
US7208267B2 (en) 2000-11-22 2007-04-24 Diadexus, Inc. Compositions and methods relating to breast specific genes and proteins
US7534579B2 (en) 1998-06-30 2009-05-19 Millennium Pharmaceuticals, Inc. 14273 receptor, a novel G-protein coupled receptor
US7601825B2 (en) 2001-03-05 2009-10-13 Agensys, Inc. Nucleic acid and corresponding protein entitled 121P1F1 useful in treatment and detection of cancer
US7628989B2 (en) 2001-04-10 2009-12-08 Agensys, Inc. Methods of inducing an immune response
US7927597B2 (en) 2001-04-10 2011-04-19 Agensys, Inc. Methods to inhibit cell growth
US8101349B2 (en) 1997-12-23 2012-01-24 Novartis Vaccines And Diagnostics, Inc. Gene products differentially expressed in cancerous cells and their methods of use II
EP2578685A2 (fr) 2005-08-23 2013-04-10 The Trustees of The University of Pennsylvania ARN contenant des nucléosides modifiées et leurs procédés d'utilisation
US8980864B2 (en) 2013-03-15 2015-03-17 Moderna Therapeutics, Inc. Compositions and methods of altering cholesterol levels
US9050297B2 (en) 2012-04-02 2015-06-09 Moderna Therapeutics, Inc. Modified polynucleotides encoding aryl hydrocarbon receptor nuclear translocator
US9181319B2 (en) 2010-08-06 2015-11-10 Moderna Therapeutics, Inc. Engineered nucleic acids and methods of use thereof
US9255129B2 (en) 2012-04-02 2016-02-09 Moderna Therapeutics, Inc. Modified polynucleotides encoding SIAH E3 ubiquitin protein ligase 1
US9572897B2 (en) 2012-04-02 2017-02-21 Modernatx, Inc. Modified polynucleotides for the production of cytoplasmic and cytoskeletal proteins
US9587003B2 (en) 2012-04-02 2017-03-07 Modernatx, Inc. Modified polynucleotides for the production of oncology-related proteins and peptides
US9657295B2 (en) 2010-10-01 2017-05-23 Modernatx, Inc. Modified nucleosides, nucleotides, and nucleic acids, and uses thereof
US9950068B2 (en) 2011-03-31 2018-04-24 Modernatx, Inc. Delivery and formulation of engineered nucleic acids
US10022425B2 (en) 2011-09-12 2018-07-17 Modernatx, Inc. Engineered nucleic acids and methods of use thereof
US10064935B2 (en) 2015-10-22 2018-09-04 Modernatx, Inc. Human cytomegalovirus RNA vaccines
US10064934B2 (en) 2015-10-22 2018-09-04 Modernatx, Inc. Combination PIV3/hMPV RNA vaccines
US10124055B2 (en) 2015-10-22 2018-11-13 Modernatx, Inc. Zika virus RNA vaccines
CN109563550A (zh) * 2016-08-04 2019-04-02 静冈县 判定有无癌症发病风险的方法
US10273269B2 (en) 2017-02-16 2019-04-30 Modernatx, Inc. High potency immunogenic zika virus compositions
US10323076B2 (en) 2013-10-03 2019-06-18 Modernatx, Inc. Polynucleotides encoding low density lipoprotein receptor
US10449244B2 (en) 2015-07-21 2019-10-22 Modernatx, Inc. Zika RNA vaccines
US10653767B2 (en) 2017-09-14 2020-05-19 Modernatx, Inc. Zika virus MRNA vaccines
US10695419B2 (en) 2016-10-21 2020-06-30 Modernatx, Inc. Human cytomegalovirus vaccine
US10815291B2 (en) 2013-09-30 2020-10-27 Modernatx, Inc. Polynucleotides encoding immune modulating polypeptides
CN112233741A (zh) * 2020-09-30 2021-01-15 吾征智能技术(北京)有限公司 一种基于聚类的文本分类系统、设备、存储介质
US11060107B2 (en) 2013-03-14 2021-07-13 The Trustees Of The University Of Pennsylvania Purification and purity assessment of RNA molecules synthesized with modified nucleosides
US11103578B2 (en) 2016-12-08 2021-08-31 Modernatx, Inc. Respiratory virus nucleic acid vaccines
US11351242B1 (en) 2019-02-12 2022-06-07 Modernatx, Inc. HMPV/hPIV3 mRNA vaccine composition
US11364292B2 (en) 2015-07-21 2022-06-21 Modernatx, Inc. CHIKV RNA vaccines
US11406703B2 (en) 2020-08-25 2022-08-09 Modernatx, Inc. Human cytomegalovirus vaccine

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG10201602654SA (en) 2011-10-03 2016-05-30 Moderna Therapeutics Inc Modified nucleosides,nucleotides,and nucleic acids,and uses thereof
WO2013090648A1 (fr) 2011-12-16 2013-06-20 modeRNA Therapeutics Nucléoside, nucléotide, et compositions d'acide nucléique modifiés
LT2922554T (lt) 2012-11-26 2022-06-27 Modernatx, Inc. Terminaliai modifikuota rnr

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BALDI, A. ET AL.: "Differential expression of the retinoblastoma gene family members pRb/p105, p107, and pRb2/p130 in lung cancer." CLINICAL CANCER RESEARCH, vol. 2, July 1996 (1996-07), pages 1239-45, XP002099965 *
CARMECI, C. ET AL.: "Identification of a gene (GPR30) with homolgy to the G-protein-coupled receptor superfamily associated with estrogen receptor expression in breast cancer." GENOMICS, vol. 45, no. 3, 1 November 1997 (1997-11-01), pages 607-17, XP002099963 *
NUCLEIC ACID RESEARCH, vol. 23, no. 19, 1995, pages 4007-8, XP002099962 cited in the application *
RADINSKY, R. ET AL.: "Level and function of epidermal growth factor receptor predict the metastatic potential of human colon carcinoma cells." CLINICAL CANCER RESEARCH, vol. 1, January 1995 (1995-01), pages 19-31, XP002099964 *
See also references of EP1190058A2 *
YEATMAN, T.J. ET AL.: "Identification of genetic alterations associated with the process of human experimental colon cancer liver metastasis in the nude mouse." CLINICAL AND EXPERIMENTAL METASTASIS, vol. 14, no. 3, May 1996 (1996-05), pages 246-252, XP002099961 *

Cited By (132)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6617438B1 (en) * 1997-11-05 2003-09-09 Sirna Therapeutics, Inc. Oligoribonucleotides with enzymatic activity
US8101349B2 (en) 1997-12-23 2012-01-24 Novartis Vaccines And Diagnostics, Inc. Gene products differentially expressed in cancerous cells and their methods of use II
WO2000000611A2 (fr) * 1998-06-30 2000-01-06 Millennium Pharmaceuticals, Inc. Recepteur 14273, recepteur couple a une proteine g
WO2000000611A3 (fr) * 1998-06-30 2000-03-23 Millennium Pharm Inc Recepteur 14273, recepteur couple a une proteine g
US7057028B2 (en) 1998-06-30 2006-06-06 Millennium Pharmaceuticals, Inc. 14273 Receptor, a novel G-protein coupled receptor
US7534579B2 (en) 1998-06-30 2009-05-19 Millennium Pharmaceuticals, Inc. 14273 receptor, a novel G-protein coupled receptor
US6448005B1 (en) 1998-06-30 2002-09-10 Millennium Pharmaceuticals, Inc. 14723 Receptor, a novel G-protein coupled receptor
US6395877B1 (en) 1998-06-30 2002-05-28 Millennium Pharmaceuticals, Inc. 14273 receptor, a novel G-protein coupled receptor
WO2000018916A2 (fr) * 1998-09-28 2000-04-06 Chiron Corporation Genes humains et produits d'expression genique
WO2000018916A3 (fr) * 1998-09-28 2000-11-16 Chiron Corp Genes humains et produits d'expression genique
WO2000044900A3 (fr) * 1999-01-29 2000-11-30 Incyte Pharma Inc Proteines liant des acides nucleiques
WO2000044900A2 (fr) * 1999-01-29 2000-08-03 Incyte Pharmaceuticals, Inc. Proteines liant des acides nucleiques
US7282209B2 (en) 1999-02-26 2007-10-16 Millennium Pharmaceuticals, Inc. Tango 240 polypeptides and uses thereof
US7544491B2 (en) 1999-02-26 2009-06-09 Millennium Pharmaceuticals, Inc. TANGO 240 nucleic acids and uses thereof
US7285398B2 (en) 1999-02-26 2007-10-23 Millennium Pharmaceuticals, Inc. Tango 240 nucleic acids and uses thereof
WO2000050443A2 (fr) 1999-02-26 2000-08-31 Millennium Pharmaceutcals, Inc. Proteines secretees et leurs utilisations
WO2000050596A3 (fr) * 1999-02-26 2000-12-21 Millenium Pharmaceuticals Inc Recepteur 14273, un nouveau recepteur couple a la proteine g
US8754199B2 (en) 1999-02-26 2014-06-17 Millennium Pharmaceuticals, Inc. Tango 240 nucleic acids and uses thereof
EP2301947A3 (fr) * 1999-02-26 2011-11-23 Millennium Pharmaceuticals, Inc. Protéines sécrétées et utilisations associées
US8187819B2 (en) 1999-02-26 2012-05-29 Millennium Pharmaceuticals, Inc. Methods of identifying compounds that bind TANGO240
EP1179000A2 (fr) * 1999-02-26 2002-02-13 Millennium Pharmaceuticals, Inc. Proteines secretees et leurs utilisations
WO2000050596A2 (fr) * 1999-02-26 2000-08-31 Millenium Pharmaceuticals, Inc. Recepteur 14273, un nouveau recepteur couple a la proteine g
EP1179000A4 (fr) * 1999-02-26 2005-10-12 Millennium Pharm Inc Proteines secretees et leurs utilisations
US7083793B2 (en) 1999-02-26 2006-08-01 Millennium Pharmaceuticals, Inc. Tango 243 polypeptides and uses thereof
US7239986B2 (en) 1999-04-09 2007-07-03 Whitehead Institute For Biomedical Research Methods for classifying samples and ascertaining previously unknown classes
EP1043676A3 (fr) * 1999-04-09 2005-08-31 Whitehead Institute For Biomedical Research Procédé pour la classification des échantillons et la détermination des classes non préalablement connues
EP1043676A2 (fr) * 1999-04-09 2000-10-11 Whitehead Institute For Biomedical Research Procédé pour la classification des échantillons et la détermination des classes non préalablement connues
WO2000065067A2 (fr) * 1999-04-23 2000-11-02 University Of Washington Polynucleotides, polypeptides specifiques a la prostate, et leurs procedes d'utilisation
WO2000065067A3 (fr) * 1999-04-23 2001-08-02 Univ Washington Polynucleotides, polypeptides specifiques a la prostate, et leurs procedes d'utilisation
WO2001009318A1 (fr) * 1999-07-29 2001-02-08 Helix Research Institute Genes associes au cancer du foie
US6733990B1 (en) 1999-08-03 2004-05-11 Millennium Pharmaceuticals, Inc. Nucleic acid encoding 15571, a GPCR-like molecule of the secretin-like family
WO2001046252A1 (fr) * 1999-12-22 2001-06-28 Biowindow Gene Development Inc. Shanghai Nouveau polypeptide, sous-unite 9 de diacylglycerol proteine kinase humaine, et polynucleotide codant pour ce polypeptide
WO2001049879A2 (fr) * 1999-12-29 2001-07-12 Aros Applied Biotechnology Aps Expression genique et etats biologiques
WO2001049879A3 (fr) * 1999-12-29 2001-11-15 Aros Applied Biotechnology Aps Expression genique et etats biologiques
WO2001057058A3 (fr) * 2000-01-31 2003-02-13 Metagen Pharmaceuticals Gmbh Detection d'expression genique differentielle
WO2001057058A2 (fr) * 2000-01-31 2001-08-09 Metagen Pharmaceuticals Gmbh Detection d'expression genique differentielle
WO2001059115A2 (fr) * 2000-02-09 2001-08-16 Agensys, Inc. Le 83p5g4: proteine specifique de tissus fortement exprime dans le cancer de la prostate
WO2001059115A3 (fr) * 2000-02-09 2002-01-24 Urogenesys Inc Le 83p5g4: proteine specifique de tissus fortement exprime dans le cancer de la prostate
US7494793B2 (en) 2000-03-24 2009-02-24 Millennium Pharmaceuticals, Inc. 21686, dehydrogenase
US6613555B2 (en) 2000-03-24 2003-09-02 Millennium Pharmaceuticals, Inc. 32142, 21481, 25964, 21686, novel human dehydrogenase molecules and uses therefor
US6511834B1 (en) 2000-03-24 2003-01-28 Millennium Pharmaceuticals, Inc. 32142,21481,25964,21686, novel human dehydrogenase molecules and uses therefor
US7045325B2 (en) 2000-03-24 2006-05-16 Millennium Pharmaceuticals, Inc. 32142, 21481, 25964, 21686, novel dehydrogenase molecules and uses therefor
US6627423B2 (en) 2000-03-24 2003-09-30 Millennium Pharmaceuticals, Inc. 21481, a novel dehydrogenase molecule and uses therefor
WO2001072781A3 (fr) * 2000-03-28 2002-04-04 Chiron Corp Genes humains et produits d'expression genique xvi
WO2001072781A2 (fr) * 2000-03-28 2001-10-04 Chiron Corporation Genes humains et produits d'expression genique xvi
EP1326985A1 (fr) * 2000-08-18 2003-07-16 MERCK PATENT GmbH Identification d'un gene d'acetyltransferase n-terminale humaine
US7208267B2 (en) 2000-11-22 2007-04-24 Diadexus, Inc. Compositions and methods relating to breast specific genes and proteins
US6924358B2 (en) 2001-03-05 2005-08-02 Agensys, Inc. 121P1F1: a tissue specific protein highly expressed in various cancers
US7309585B2 (en) 2001-03-05 2007-12-18 Agensys, Inc. 121P1F1: a tissue specific protein highly expressed in various cancers
EP1399174A4 (fr) * 2001-03-05 2004-11-17 Agensys Inc Acide nucleique et proteine correspondante, appelee 121p1f1, utilisee dans le traitement et le depistage du cancer
US8039603B2 (en) 2001-03-05 2011-10-18 Agensys, Inc. Nucleic acid and corresponding protein entitled 121P1F1 useful in treatment and detection of cancer
US7892548B2 (en) 2001-03-05 2011-02-22 Agensys, Inc. 121P1F1: a tissue specific protein highly expressed in various cancers
US7601825B2 (en) 2001-03-05 2009-10-13 Agensys, Inc. Nucleic acid and corresponding protein entitled 121P1F1 useful in treatment and detection of cancer
US7736654B2 (en) 2001-04-10 2010-06-15 Agensys, Inc. Nucleic acids and corresponding proteins useful in the detection and treatment of various cancers
US7927597B2 (en) 2001-04-10 2011-04-19 Agensys, Inc. Methods to inhibit cell growth
US7628989B2 (en) 2001-04-10 2009-12-08 Agensys, Inc. Methods of inducing an immune response
US7641905B2 (en) 2001-04-10 2010-01-05 Agensys, Inc. Methods of inducing an immune response
WO2002083727A3 (fr) * 2001-04-12 2003-04-17 Joern Bullerdiek Sequences d'acides nucleiques relatives a des hyperplasies et tumeurs de la glande thyroide
WO2002083727A2 (fr) * 2001-04-12 2002-10-24 Universität Bremen Sequences d'acides nucleiques relatives a des hyperplasies et tumeurs de la glande thyroide
JP2005507631A (ja) * 2001-08-16 2005-03-24 メルク パテント ゲゼルシャフト ミット ベシュレンクテル ハフトング 新規ヒトn末端アセチルトランスフェラーゼの同定
WO2003074701A1 (fr) * 2002-03-07 2003-09-12 Bf Research Institute, Inc. Gene p18a$g(b)rp et proteine p18a$g(b)rp, nouveau gene/nouvelle proteine (p60trp) qui interagissent avec ce gene et cette proteine afin d'inhiber la mort cellulaire et le promoteur de la mort cellulaire
EP1426442A1 (fr) * 2002-12-02 2004-06-09 MTM Laboratories AG Marqueurs associes au lesions colorectales
EP1471154A2 (fr) * 2003-04-24 2004-10-27 Veridex LLC Pronostic du cancer du sein
US7306910B2 (en) 2003-04-24 2007-12-11 Veridex, Llc Breast cancer prognostics
EP1471154A3 (fr) * 2003-04-24 2005-02-09 Veridex LLC Pronostic du cancer du sein
US8691966B2 (en) 2005-08-23 2014-04-08 The Trustees Of The University Of Pennsylvania RNA containing modified nucleosides and methods of use thereof
US10232055B2 (en) 2005-08-23 2019-03-19 The Trustees Of The University Of Pennsylvania RNA containing modified nucleosides and methods of use thereof
EP2578685A2 (fr) 2005-08-23 2013-04-10 The Trustees of The University of Pennsylvania ARN contenant des nucléosides modifiées et leurs procédés d'utilisation
US8835108B2 (en) 2005-08-23 2014-09-16 The Trustees Of The University Of Pennsylvania RNA containing modified nucleosides and methods of use thereof
US8748089B2 (en) 2005-08-23 2014-06-10 The Trustees Of The University Of Pennsylvania RNA containing modified nucleosides and methods of use thereof
EP3611266A1 (fr) * 2005-08-23 2020-02-19 The Trustees of the University of Pennsylvania Arn contenant des nucléosides modifiées et leurs procédés d'utilisation
EP4332227A1 (fr) * 2005-08-23 2024-03-06 The Trustees of the University of Pennsylvania Arn contenant des nucléosides modifiées et leurs procédés d'utilisation
US11389547B2 (en) 2005-08-23 2022-07-19 The Trustees Of The University Of Pennsylvania RNA containing modified nucleosides and methods of use thereof
US9750824B2 (en) 2005-08-23 2017-09-05 The Trustees Of The University Of Pennsylvania RNA containing modified nucleosides and methods of use thereof
US11801314B2 (en) 2005-08-23 2023-10-31 The Trustees Of The University Of Pennsylvania RNA containing modified nucleosides and methods of use thereof
EP4174179A3 (fr) * 2005-08-23 2023-09-27 The Trustees of the University of Pennsylvania Arn contenant des nucléosides modifiées et leurs procédés d'utilisation
US9937233B2 (en) 2010-08-06 2018-04-10 Modernatx, Inc. Engineered nucleic acids and methods of use thereof
US9181319B2 (en) 2010-08-06 2015-11-10 Moderna Therapeutics, Inc. Engineered nucleic acids and methods of use thereof
US9657295B2 (en) 2010-10-01 2017-05-23 Modernatx, Inc. Modified nucleosides, nucleotides, and nucleic acids, and uses thereof
US10064959B2 (en) 2010-10-01 2018-09-04 Modernatx, Inc. Modified nucleosides, nucleotides, and nucleic acids, and uses thereof
US9950068B2 (en) 2011-03-31 2018-04-24 Modernatx, Inc. Delivery and formulation of engineered nucleic acids
US10751386B2 (en) 2011-09-12 2020-08-25 Modernatx, Inc. Engineered nucleic acids and methods of use thereof
US10022425B2 (en) 2011-09-12 2018-07-17 Modernatx, Inc. Engineered nucleic acids and methods of use thereof
US9255129B2 (en) 2012-04-02 2016-02-09 Moderna Therapeutics, Inc. Modified polynucleotides encoding SIAH E3 ubiquitin protein ligase 1
US9572897B2 (en) 2012-04-02 2017-02-21 Modernatx, Inc. Modified polynucleotides for the production of cytoplasmic and cytoskeletal proteins
US9827332B2 (en) 2012-04-02 2017-11-28 Modernatx, Inc. Modified polynucleotides for the production of proteins
US9828416B2 (en) 2012-04-02 2017-11-28 Modernatx, Inc. Modified polynucleotides for the production of secreted proteins
US9814760B2 (en) 2012-04-02 2017-11-14 Modernatx, Inc. Modified polynucleotides for the production of biologics and proteins associated with human disease
US9782462B2 (en) 2012-04-02 2017-10-10 Modernatx, Inc. Modified polynucleotides for the production of proteins associated with human disease
US9587003B2 (en) 2012-04-02 2017-03-07 Modernatx, Inc. Modified polynucleotides for the production of oncology-related proteins and peptides
US9878056B2 (en) 2012-04-02 2018-01-30 Modernatx, Inc. Modified polynucleotides for the production of cosmetic proteins and peptides
US10501512B2 (en) 2012-04-02 2019-12-10 Modernatx, Inc. Modified polynucleotides
US9089604B2 (en) 2012-04-02 2015-07-28 Moderna Therapeutics, Inc. Modified polynucleotides for treating galactosylceramidase protein deficiency
US9050297B2 (en) 2012-04-02 2015-06-09 Moderna Therapeutics, Inc. Modified polynucleotides encoding aryl hydrocarbon receptor nuclear translocator
US9061059B2 (en) 2012-04-02 2015-06-23 Moderna Therapeutics, Inc. Modified polynucleotides for treating protein deficiency
US11060107B2 (en) 2013-03-14 2021-07-13 The Trustees Of The University Of Pennsylvania Purification and purity assessment of RNA molecules synthesized with modified nucleosides
US8980864B2 (en) 2013-03-15 2015-03-17 Moderna Therapeutics, Inc. Compositions and methods of altering cholesterol levels
US10815291B2 (en) 2013-09-30 2020-10-27 Modernatx, Inc. Polynucleotides encoding immune modulating polypeptides
US10323076B2 (en) 2013-10-03 2019-06-18 Modernatx, Inc. Polynucleotides encoding low density lipoprotein receptor
US10449244B2 (en) 2015-07-21 2019-10-22 Modernatx, Inc. Zika RNA vaccines
US10702597B2 (en) 2015-07-21 2020-07-07 Modernatx, Inc. CHIKV RNA vaccines
US11364292B2 (en) 2015-07-21 2022-06-21 Modernatx, Inc. CHIKV RNA vaccines
US11007260B2 (en) 2015-07-21 2021-05-18 Modernatx, Inc. Infectious disease vaccines
US10064934B2 (en) 2015-10-22 2018-09-04 Modernatx, Inc. Combination PIV3/hMPV RNA vaccines
US10383937B2 (en) 2015-10-22 2019-08-20 Modernatx, Inc. Human cytomegalovirus RNA vaccines
US10238731B2 (en) 2015-10-22 2019-03-26 Modernatx, Inc. Chikagunya virus RNA vaccines
US10064935B2 (en) 2015-10-22 2018-09-04 Modernatx, Inc. Human cytomegalovirus RNA vaccines
US10702599B2 (en) 2015-10-22 2020-07-07 Modernatx, Inc. HPIV3 RNA vaccines
US10702600B1 (en) 2015-10-22 2020-07-07 Modernatx, Inc. Betacoronavirus mRNA vaccine
US10716846B2 (en) 2015-10-22 2020-07-21 Modernatx, Inc. Human cytomegalovirus RNA vaccines
US11872278B2 (en) 2015-10-22 2024-01-16 Modernatx, Inc. Combination HMPV/RSV RNA vaccines
US10272150B2 (en) 2015-10-22 2019-04-30 Modernatx, Inc. Combination PIV3/hMPV RNA vaccines
US11484590B2 (en) 2015-10-22 2022-11-01 Modernatx, Inc. Human cytomegalovirus RNA vaccines
US10933127B2 (en) 2015-10-22 2021-03-02 Modernatx, Inc. Betacoronavirus mRNA vaccine
US10675342B2 (en) 2015-10-22 2020-06-09 Modernatx, Inc. Chikungunya virus RNA vaccines
US10543269B2 (en) 2015-10-22 2020-01-28 Modernatx, Inc. hMPV RNA vaccines
US10517940B2 (en) 2015-10-22 2019-12-31 Modernatx, Inc. Zika virus RNA vaccines
US11278611B2 (en) 2015-10-22 2022-03-22 Modernatx, Inc. Zika virus RNA vaccines
US10124055B2 (en) 2015-10-22 2018-11-13 Modernatx, Inc. Zika virus RNA vaccines
US11235052B2 (en) 2015-10-22 2022-02-01 Modernatx, Inc. Chikungunya virus RNA vaccines
CN109563550A (zh) * 2016-08-04 2019-04-02 静冈县 判定有无癌症发病风险的方法
US11197927B2 (en) 2016-10-21 2021-12-14 Modernatx, Inc. Human cytomegalovirus vaccine
US11541113B2 (en) 2016-10-21 2023-01-03 Modernatx, Inc. Human cytomegalovirus vaccine
US10695419B2 (en) 2016-10-21 2020-06-30 Modernatx, Inc. Human cytomegalovirus vaccine
US11103578B2 (en) 2016-12-08 2021-08-31 Modernatx, Inc. Respiratory virus nucleic acid vaccines
US10273269B2 (en) 2017-02-16 2019-04-30 Modernatx, Inc. High potency immunogenic zika virus compositions
US11207398B2 (en) 2017-09-14 2021-12-28 Modernatx, Inc. Zika virus mRNA vaccines
US10653767B2 (en) 2017-09-14 2020-05-19 Modernatx, Inc. Zika virus MRNA vaccines
US11351242B1 (en) 2019-02-12 2022-06-07 Modernatx, Inc. HMPV/hPIV3 mRNA vaccine composition
US11406703B2 (en) 2020-08-25 2022-08-09 Modernatx, Inc. Human cytomegalovirus vaccine
CN112233741A (zh) * 2020-09-30 2021-01-15 吾征智能技术(北京)有限公司 一种基于聚类的文本分类系统、设备、存储介质
CN112233741B (zh) * 2020-09-30 2024-03-01 吾征智能技术(北京)有限公司 一种基于聚类的文本分类系统、设备、存储介质

Also Published As

Publication number Publication date
JP2002500010A (ja) 2002-01-08
EP1190058A2 (fr) 2002-03-27
AU2095599A (en) 1999-07-19
WO1999033982A3 (fr) 1999-12-23

Similar Documents

Publication Publication Date Title
WO1999033982A2 (fr) Genes humains et produits d'expression genique i
EP1053319A2 (fr) Genes humains et expression de produits genetiques ii
US7122373B1 (en) Human genes and gene expression products V
US8101349B2 (en) Gene products differentially expressed in cancerous cells and their methods of use II
WO2001002568A2 (fr) Nouveaux genes humains et produits d'expression genique
US20070243176A1 (en) Human genes and gene expression products
US7601505B2 (en) Compositions, kits, and methods for identification, assessment, prevention, and therapy of breast cancer
US20060179496A1 (en) Nucleic acid sequences differentially expressed in cancer tissue
US20030190640A1 (en) Genes expressed in prostate cancer
US20020076735A1 (en) Diagnostic and therapeutic methods using molecules differentially expressed in cancer cells
EP1263956A2 (fr) Nouveaux genes humains et leurs produits d'expression
US6964868B1 (en) Human genes and gene expression products II
KR20060120652A (ko) 난소암의 확인, 평가, 예방 및 치료를 위한 핵산 분자 및단백질
US20030065156A1 (en) Novel human genes and gene expression products I
EP1086218A2 (fr) Genes et produits d'expression genique regules de fa on differentielle lors du cancer de la prostate
WO2001018542A2 (fr) Compositions, trousses et methodes pour l'identification, l'analyse, la prevention et la therapie du cancer des ovaires
EP1370684B1 (fr) Polynucleotides lies au cancer du colon
EP1144636A2 (fr) GèNES HUMAINS ET LEURS PRODUITS D'EXPRESSION
EP1268528A2 (fr) Genes humaines et produits d'expression
US20050176930A1 (en) Marker molecules associated with lung tumors
EP1466988A2 (fr) Gènes et produits d'expression génique régulés de façon différentielle lors du cancer de la prostate

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase in:

Ref country code: JP

Ref document number: 2000 526638

Kind code of ref document: A

Format of ref document f/p: F

NENP Non-entry into the national phase in:

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 1998965500

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1998965500

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1998965500

Country of ref document: EP