WO2003087766A2 - Methods of diagnosing potential for metastasis or developing hepatocellular carcinoma and of identifying therapeutic targets - Google Patents

Methods of diagnosing potential for metastasis or developing hepatocellular carcinoma and of identifying therapeutic targets Download PDF

Info

Publication number
WO2003087766A2
WO2003087766A2 PCT/US2003/010783 US0310783W WO03087766A2 WO 2003087766 A2 WO2003087766 A2 WO 2003087766A2 US 0310783 W US0310783 W US 0310783W WO 03087766 A2 WO03087766 A2 WO 03087766A2
Authority
WO
WIPO (PCT)
Prior art keywords
genes
hcc
sample
patient
array
Prior art date
Application number
PCT/US2003/010783
Other languages
French (fr)
Other versions
WO2003087766A3 (en
Inventor
Xin Wei Wang
Qing-Hai Ye
Jin Woo Kim
Original Assignee
The Government Of The United States Of America, As Represented By The Secretary Of The Department Of Health And Human Services
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Government Of The United States Of America, As Represented By The Secretary Of The Department Of Health And Human Services filed Critical The Government Of The United States Of America, As Represented By The Secretary Of The Department Of Health And Human Services
Priority to AU2003230838A priority Critical patent/AU2003230838A1/en
Publication of WO2003087766A2 publication Critical patent/WO2003087766A2/en
Publication of WO2003087766A3 publication Critical patent/WO2003087766A3/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57438Specifically defined cancers of liver, pancreas or kidney
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/136Screening for pharmacological compounds

Definitions

  • Hepatocellular carcinoma is one ofthe most common and aggressive malignancies worldwide with a curable rate of less than 5%.
  • the high mortality is mainly due to the occurrence of intra-hepatic metastases. Little is Icnown about the molecular basis of intra-hepatic metastasis or about specific therapeutic targets in these patients.
  • Such monitoring technologies have been applied to the identification of genes which are up regulated or down regulated in various diseased or physiological states, the analyses of members of signaling cellular states, and the identification of targets for various drugs.
  • the present inventors analyzed the expression of 9,180 genes in HCC tissues from 40 patients without or with accompanying intra-hepatic metastases. Using a supervised machine learning algorithm to classify patients based on their gene expression signatures, a molecular signature has been generated for the first time that correctly classifies patients with or without metastases and have identifies genes that are mostly relevant to the prediction of outcome including patient survival.
  • osteopontin OPN
  • a neutralizing antibody against osteopontin is shown to block invasion of highly metastatic HCC cells in an in vitro assay of invasion.
  • the expression of 9,180 genes has also been analyzed in tumor samples from 54 HCC patients and in 59 non-cancerous liver samples from patients with severe liver diseases and at high risk for developing HCC or at low risk for developing HCC.
  • the high risk group includes patients diagnosed with hepatitis B, hepatitis C, hemochromatosis, and Wilson's disease.
  • the low risk group includes patients diagnosed with alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis.
  • a comparison ofthe gene expression levels between the high risk and low risk groups has identified a set of significant genes that would differentiate between the high risk and low risk groups.
  • EpCAM is among the most significant genes whose overexpression positively correlates to the risk of developing HCC in a patient with a severe liver disease and the inhibition of its expression has been shown to lead to growth suppression in HCC cells.
  • EpCAM has been identified as a diagnostic marker for predicting the risk of developing HCC as well as a therapeutic target for preventing the onset of HCC in patients suffering from chronic liver diseases.
  • One aspect ofthe present invention relates to a method for identifying potential therapeutic targets for inhibiting metastasis in a patient suffering from HCC or for preventing the development of HCC in a patient suffering from a chronic liver disease.
  • the method for identifying potential therapeutic targets for inhibiting metastasis in an HCC patient includes the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a metastatic HCC patient; b) capturing markers from the sample and generating a first signal; c) repeating steps a) and b) with a sample from a non-metastatic HCC patient and thereby generating a second signal; and d) comparing the first and second signals and thereby identifying a subset of cellular markers whose level is different in the first and second signals, wherein the subset of cellular markers are potential therapeutic targets for treating HCC metastasis in an HCC patient.
  • a signal generated from a normal non-cancerous sample on an array identical to the array of step a) is subtracted in steps b) and c) to generate the first and second signals.
  • the method for identifying potential therapeutic targets for preventing the onset of HCC in a patient with a chronic liver disease includes the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a patient with a chronic liver disease and a high risk of developing HCC; b) capturing markers from the sample and generating a first signal; c) repeating steps a) and b) with a sample from a patient with a chronic liver disease and a low risk of developing HCC and thereby generating a second signal; and d) comparing the first and second signals and thereby identifying a subset of cellular markers whose level is different in the first and second signals, wherein the subset of cellular markers are potential therapeutic target for preventing HCC in a patient with a chronic liver disease.
  • a signal generated from a normal non-cancerous sample on an array identical to the array of step a) is subtracted in steps b) and c) to generate
  • Another aspect ofthe present invention relates to a method for predicting the metastatic potential in an HCC patient or for predicting the risk of developing HCC in a patient with a chronic liver disease.
  • the method for predicting the metastatic potential in an HCC patient includes the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a metastatic HCC patient, the set of cellular markers comprising at least ten genes or proteins encoded by genes independently selected from the genes of Table 2; b) capturing markers from the sample; c) generating a first signal from the captured markers of step b); d) repeating steps a) to c) with a sample from a non-metastatic HCC patient and thereby generating a second signal; e) repeating steps a) to c) with a sample from an HCC patient with unknown metastatic potential and thereby generating a third signal; and f) comparing the third signal to the first and the second signals and thereby determining the metastatic potential of the HCC patient of step e).
  • the set of cellular markers includes at least 20, preferably 50, more preferably 100, and most preferably all genes or proteins encoded by genes independently selected from the genes of Table 2.
  • the set of cellular markers includes the genes or proteins encoded by genes of Table 4 or Unigene numbers Hs.313, Hs.69707, Hs.222, Hs.63984, Hs.75573, Hs.177687, Hs.69707, Hs.222, Hs.323712, and Hs.63984.
  • the sample of steps a) and b), the sample of step d), and the sample of step e) are liver tissue extracts.
  • the array of step a) is a genomic array.
  • the array of step a) is a proteomic array.
  • the method for predicting the risk of developing HCC in a patient suffering from a chronic liver disease includes the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a patient with a chronic liver disease and a high risk of HCC, the set of cellular markers comprising at least ten genes or proteins encoded by genes independently selected from the genes of Table 5; b) capturing markers from the sample; c) generating a first signal from the captured markers of step b); d) repeating steps a) to c) with a sample from a patient with a chronic liver disease and a low risk of HCC and thereby generating a second signal; e) repeating steps a) to c) with a sample from a patient with a chronic liver disease and an unknown risk of HCC and thereby generating a third signal; and f) comparing the third signal to the first and the second signals and thereby determining the risk of developing HCC in the patient of step e
  • the set of cellular markers comprises at least 20, preferably 50, more preferably 100, and most preferably all genes or proteins encoded by genes independently selected from the genes of Table 5.
  • the set of cellular markers comprises the genes or proteins encodec by genes of Table 6 or Table 7.
  • the sample of steps a) and b), the sample of step d), and the sample of step e) are liver tissue extracts.
  • the array of step a) is a genomic array. In another preferred embodiment, the array of step a) is a proteomic array.
  • the patient with a high risk of developing HCC suffers from hepatitis B infection, hepatitis C, hemachromatosis, or Wilson's disease.
  • the patient with a low risk of HCC suffers from alcoholic liver disease, autoimmune hepatitis, or primary biliary cirrhosis.
  • the patient whose risk of developing HCC is being assessed suffers from hepatitis B, hepatitis C, hemochromatosis, Wilson's disease, alcoholic liver disease, autoimmune hepatitis, or primary biliary cirrhosis.
  • Yet another aspect ofthe invention relates to a method for inhibiting metastasis in an HCC patient as well as a method for inhibiting the development of HCC in a patient with a chronic liver disease.
  • the method for inhibiting HCC metastasis in an HCC patient includes the step of suppressing OPN activity.
  • suppression of OPN activity is accomplished by inhibiting OPN expression, preferably using an antisense polynucleotide specific for OPN.
  • suppression of OPN activity is accomplished by inhibiting the specific binding between OPN and OPN receptor, preferably using an anti- OPN antibody.
  • the method for preventing the onset of HCC in a patient with a chronic liver disease includes the step of suppressing EpCAM activity.
  • suppression of EpCAM activity is accomplished by inhibiting EpCAM expression, preferably using an antisense polynucleotide or a small inhibitory RNA molecule specific for EpCAM. In other embodiments, suppression of EpCAM activity is accomplished by inhibiting the specific binding between EpCAM and EpCAM receptor, preferably using an anti-EpCAM antibody.
  • a still further aspect ofthe present invention relates to a computer readable medium, a digital computer, and a system for accessing the metastatic potential in an HCC patient or the risk of developing HCC in a patient with a chronic liver disease.
  • the computer readable medium for assessing the metastatic potential in an HCC patient includes: a) code for a first data set, derived from a first signal from an array comprising capture reagents for a set of cellular markers after contact with a sample from a metastatic HCC patient, the set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 2; b) code for a second data set, derived from a second signal from an array identical to the array of a) after contact with a sample from a non-metastatic HCC patient; c) code for a third data set, derived from a third signal from an array identical to the array of a) after contact with a sample from a HCC patient with unknown metastatic potential; and d) code for comparing the third data set with the first and second data sets.
  • a digital computer containing the claimed computer readable medium for assessing HCC metastatic potential in an HCC patient is also provided. Further provided is a system containing such a digital computer, a chip with an array comprising capture reagents for a set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 2, and a reader capable of registering a signal from the array after contact with a sample.
  • the computer readable medium for assessing the risk of developing HCC in a patient with a chronic liver disease includes: a) code for a first data set, derived from a first signal from an array comprising capture reagents for a set of cellular markers after contact with a sample from a patient with a chronic liver disease and a high risk of HCC, the set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 5; b) code for a second data set, derived from a second signal from an array identical to the array of a) after contact with a sample from a patient with a chronic liver disease and a low risk of HCC; c) code for a third data set, derived from a third signal from an array identical to the array of a) after contact with a sample from a patient with a chronic liver disease and an unknown risk of HCC; and d) code for comparing the third data set with the first and second data sets.
  • a digital computer containing the claimed computer readable medium for assessing the risk of develop HCC in a patient with a chronic liver disease is also provided. Further provided is a system containing such a digital computer, a chip with an array comprising capture reagents for a set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 5, and a reader capable of registering a signal from the array after contact with a sample.
  • Hepatocellular carcinoma refers to the major type of carcinoma ofthe liver that accounts for more than 90% of all primary liver cancers. Hepatocellular carcinomas range from well differentiated to highly anaplastic undifferentiated lesions. Hepatocellular carcinomas may exist as single intra-hepatic lesions (non-metastatic), multifocal intra-hepatic metastasis or as extra-hepatic metastasis.
  • High risk precancerous diseases refer to a group of epidemiologically defined diseases that are associated with a high probability of developing HCC. These diseases include chronic hepatitis B infection, hepatitis C infection, hemochromatosis, and Wilson's disease.
  • Low risk precancerous diseases refer to a group of epidemiologically defined diseases, that are associated with a low risk of developing HCC. These diseases include alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis.
  • alcoholic liver disease e.g., alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis.
  • metalastasis or “metastatic” refers to the ability of a cancer cell to invade surrounding tissues, to enter the circulatory system and to establish malignant growths at new sites.
  • Non-Metastatic refers to tumors that do not spread beyond their original site of development and specifically do not enter the circulatory system and establish malignant growths at new sites.
  • non-cancerous refers to a biological sample or tissue sample in which the cells in the sample exhibit a normal or non-pathological phenotype when analyzed visually, by microscope, immunohistologically, immunologically, or molecularly using antibody or nucleic acid probes designed to detect pathological conditions.
  • normal refers to a biological sample or tissue sample in which the sample is obtained from an individual who has not been diagnosed with HCC or high risk, or low risk precancerous diseases.
  • capture reagent refers to any type of moiety that binds to a specific nucleic acid or protein marker.
  • binding of the marker to the capture reagent can be controlled by the conditions used during the binding process.
  • the binding of a nucleic acid marker to a cognate oligonucleotide is controlled by the hybridization conditions used. Stringent hybridizations conditions will only allow a nucleic acid marker that has high homology e.g. 95%-100% identity with the oligonucleotide to bind to the oligonucleotide.
  • Array refers to a plurality of capture reagents bound to a substrate, e.g., a solid support, which will bind to their cognate markers.
  • the array may be composed of nucleic acid molecules, protein molecules or any other reagent that will specifically bind a nucleic acid, protein or polypeptide isolated from a biological sample.
  • the capture reagents are preferentially bound in an addressable fashion such that when the cognate marker is bound to the capture reagent, the amount of binding may be quantified.
  • DNA microarray refers to an array in which the capture reagents are nucleic acid molecules.
  • a DNA microarray is composed of DNA oligonucleotides of a defined length which can hybridize to DNA, cDNA or RNA molecules under defined conditions.
  • DNA oligonucleotides may be short pieces of nucleic acid ranging is size from 15-50 bases or they may be longer pieces of nucleic acids ranging in size from 500-1000 bases or longer.
  • DNA microarrays may be composed of hundreds or thousands of different nucleic acid molecules each of which is located on the array in a defined position. Binding ofthe marker to the DNA microarray is usually quantified when the marker is labeled with a detectable moiety.
  • the term DNA microarray is used interchangeably with the term "genomic array"
  • Protein array refers to an array in which the capture reagents will bind protein markers. Typically these reagents may be polyclonal or monoclonal antibodies that bind specific proteins. Alternatively, any protein, peptide, nucleic acid or other molecule or surface which will specifically bind to a protein may be used in a protein array. These arrays usually contain hundreds or thousands of different capture reagents in addressable locations. Binding ofthe markers to the capture reagent on the protein array is usually quantified when the marker is labeled with a detectable moiety. The term protein array is used interchangeably with "proteomic array”.
  • Gene expression profile refers to the all ofthe genes that are expressed in a tissue sample compared to a reference sample.
  • the level of gene expression of genes in a gene expression profile is determined by comparing the level of expression in a test sample e.g. an HCC tumor sample or a sample obtained from a patient diagnosed with severe liver disease to the level of expression in a reference sample.
  • the reference sample used for determining the metastatic potential of an HCC tumor is non-cancerous liver tissue or liver tissue obtained from a patient who has not been diagnosed with HCC.
  • the reference sample used for determining the potential for developing HCC in patients diagnosed with severe liver disease is liver tissue obtained from patients who have not been diagnosed with severe liver disease. Genes in the test sample may be over expressed or under expressed relative to the reference sample.
  • Metal gene expression predictor refers to the expression of a specific cluster of genes correlated with the diagnosis of metastatic HCC.
  • the metastatic gene expression predictor is generated by comparing the gene expression profile of a test sample obtained from a non-metastatic HCC sample to the gene expression profile obtained from a metastatic HCC sample followed by a cluster and classification analysis using a defined algorithm or set of algorithms.
  • HCC gene expression predictor refers to the expression of a specific cluster of genes correlated with the diagnosis of patients likely to develop HCC.
  • the HCC gene expression predictor is generated by comparing the gene expression profile of a test sample obtained from a non-metastatic liver sample obtained from a patient with a high risk for developing HCC to the gene expression profile obtained from a non-metastatic liver sample obtained from a patient having a low risk of developing HCC followed by a cluster and classification analysis using a defined algorithm or set of algorithms.
  • UG Cluster used in Tables 2-7 refers to the UniGene data base compiled by the National Center for Biological Information (“NCBI").
  • NCBI National Center for Biological Information
  • Each accession number in the UniGene data base is a compilation of all ofthe nucleotide and amino acid sequence data available for a specific nucleotide sequence.
  • each UG Cluster accession number may provide links to GeneBank or other data base which in turn provide nucleotide sequences encoding a partial or full length cDNA for a gene. Alternatively the links may provide genomic or EST sequence data or amino acid sequence information.
  • Each UG Cluster accession number provides unique sequence information for the specific gene, nucleic acid or amino acid sequence identified.
  • Osteopontin refers to a secreted phosphoprotein encoded by SEQ ID NO: 1 or a conservative variant thereof, which may also be found in Genbank accession number NM_000582. Nucleic acid and amino acid sequence information may also be found in the National Center for Biological Information (“NCBI") UniGene data base under accession number Hs.313 at NCBI web site. This site lists 9 mRNA/genomic DNA sequences and over 900 expressed sequence tags. Osteopontin is an extracellular protein associated with the bone matrix and associated with atherosclerotic plaques. Full length osteopontin protein contains an RGD amino acid sequence that functions as an integrin binding site. Osteopontin is a major ligand for the vitronectin receptor. "OPN” is used interchangeably with osteopontin and refers either to the protein, the gene encoding the protein or fragments thereof.
  • EpCAM is a 40 kDa glycoprotein that functions as an Epithelial Cell Adhesion Molecule. It is also identified as tumor-associated calcium signal transducer or TACSTD1, with a Unigene Cluster number of Hs.692. EpCAM is encoded by the GA733-2 gene, which is located on human chromosome 4q. A transmembrane protein expressed in cells of epithelial origin, EpCAM mediates Ca 2+ -independent homotypic cell-cell adhesion and is specifically recognized by a number of well known monoclonal antibodies (mAb), such as 17-1A, 323/A3, KS1/4, GA733, MOC31, etc.
  • mAb monoclonal antibodies
  • Marker in the context ofthe present invention refers to a nucleic acid sequence or a gene encoding a polypeptide (of a particular apparent molecular weight) which is differentially present in a sample taken from patients having metastatic HCC or a predisposition for HCC as compared to a comparable sample taken from control subjects (e.g., a person with non-metastatic HCC or a negative diagnosis or undetectable cancer, normal or healthy subject).
  • Marker may also refer to a polypeptide or protein encoded by a nucleic acid sequence or gene which is differentially present in a sample taken from patients having metastatic HCC or a predisposition for HCC as compared to a comparable sample taken from control subjects (e.g., a person with non-metastatic HCC or a negative diagnosis or undetectable cancer, normal or healthy subject). Markers ofthe present invention include the genes and their encoded proteins identified by UG Cluster number in Tables 2-7 infra.
  • sample is a sample of biological tissue or fluid that will be used to determine a gene expression profile, a source of markers, or that contains a protein of interest (such as osteopontin or EpCAM) or a nucleic acid encoding such protein.
  • samples include, but are not limited to, various types of tissue isolated from humans, and may also include sections of tissues such as frozen sections or paraffin sections taken for histological purposes.
  • Tissues include liver samples and fluid samples include blood, serum, plasma, urine, and other bodily fluids.
  • a preferred sample used for practicing the present invention is a lysate of cells extracted from a tissue of interest, e.g., liver.
  • Such a cell lysate may be prepared using a variety of methods known to those skilled in the art, depending on the form in which a cellular marker is to be detected and examined, e.g., as a nucleic acid such as mRNA, as a protein, or as a molecule with other measurable biological characteristics such as an enzymatic activity.
  • a cellular marker e.g., as a nucleic acid such as mRNA, as a protein, or as a molecule with other measurable biological characteristics such as an enzymatic activity.
  • the phrase "functional effects" in the context of assays for testing compounds that regulate the biological activity of a protein of interest, e.g., osteopontin or EpCAM, includes the determination of any parameter that is directly or indirectly related to or under the influence of OPN or EpCAM, such as the level of mRNA encoding the proteins, the level of the proteins, as well as their functional, physical, and chemical effects (e.g., their ability to specifically interact with their naturally binding partners, such as other proteins, nucleic acids, or any other molecules, their ability to mediate signal transduction that may affect cellular events such as cell proliferation, differentiation, apoptosis, secretion, adhesion, and the like).
  • any parameter that is directly or indirectly related to or under the influence of OPN or EpCAM such as the level of mRNA encoding the proteins, the level of the proteins, as well as their functional, physical, and chemical effects (e.g., their ability to specifically interact with their naturally binding partners, such as other proteins, nucleic acids, or any other molecules,
  • Nucleic acid refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form.
  • the term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2- O-methyl ribonucleotides, peptide-nucleic acids (PNAs).
  • PNAs peptide-nucleic acids
  • nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g. , degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated.
  • degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081, 1991; Ohtsuka et al., J. Biol. Chem. 260:2605-2608, 1985; Rossolini et al., Mol. Cell. Probes 8:91-98, 1994).
  • nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and polynucleotide.
  • polypeptide peptide
  • protein protein
  • amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non- naturally occurring amino acid polymer.
  • amino acid refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids.
  • Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g. , hydroxyproline, ⁇ - carboxyglutamate, and O-phosphoserine.
  • Amino acid analogs refer to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid.
  • Amino acid mimetics refer to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid.
  • Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
  • Constantly modified variants applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences. Because ofthe degeneracy ofthe genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any ofthe corresponding codons described without altering the encoded polypeptide.
  • nucleic acid variations are "silent variations," which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation ofthe nucleic acid.
  • each codon in a nucleic acid except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan
  • TGG which is ordinarily the only codon for tryptophan
  • amino acid sequences one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a "conservatively modified variant" where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Such conservatively modified variants are in addition to and do not exclude polymo ⁇ hic variants, interspecies homologs, and alleles ofthe invention.
  • Macromolecular structures such as polypeptide structures can be described in terms of various levels of organization. For a general discussion of this organization, see, e.g., Alberts et al., Molecular Biology ofthe Cell (3 rd ed., 1994) and Cantor and Schimmel, Biophysical Chemistry Part I: The Conformation of Biological Macromolecules (1980).
  • Primary structure refers to the amino acid sequence of a particular peptide.
  • “Secondary structure” refers to locally ordered, three dimensional structures within a polypeptide. These structures are commonly known as domains. Domains are portions of a polypeptide that form a compact unit of the polypeptide and are typically 50 to 350 amino acids long.
  • Typical domains are made up of sections of lesser organization such as stretches of ⁇ -sheet and ⁇ - helices.
  • Tetiary structure refers to the complete three dimensional structure of a polypeptide monomer.
  • Quaternary structure refers to the three dimensional structure formed by the noncovalent association of independent tertiary units. Anisotropic terms are also known as energy terms.
  • Antibody refers to a polypeptide comprising a framework region from an immunoglobulin gene or fragments thereof that specifically binds and recognizes an antigen.
  • the recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon, and mu constant region genes, as well as the myriad immunoglobulin variable region genes.
  • Light chains are classified as either kappa or lambda.
  • Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, IgD and IgE, respectively.
  • An exemplary immunoglobulin (antibody) structural unit comprises a tetramer.
  • Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one "light” (about 25 kDa) and one "heavy” chain (about 50-70 kDa).
  • the N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition.
  • the terms variable light chain (V L ) and variable heavy chain (V H ) refer to these light and heavy chains respectively.
  • the Fab' monomer is essentially Fab with part of the hinge region (see Fundamental Immunology (Paul ed., 3d ed. 1993). While various antibody fragments are defined in terms ofthe digestion of an intact antibody, one of skill will appreciate that such fragments may be synthesized de novo either chemically or by using recombinant DNA methodology. Thus, the term antibody, as used herein, also includes antibody fragments either produced by the modification of whole antibodies, or those synthesized de novo using recombinant DNA methodologies (e.g., single chain Fv) or those identified using phage display libraries (see, e.g., McCafferty et al., Nature 348:552-554, 1990).
  • any technique known in the art can be used (see, e.g., Kohler & Milstein, Nature 256:495-497 (1975); Kozbor et al., Immunology Today 4: 72 (1983); Cole et al., pp. 77-96 in Monoclonal Antibodies and Cancer Therapy (1985)).
  • Techniques for the production of single chain antibodies can be adapted to produce antibodies to polypeptides of this invention.
  • transgenic mice, or other organisms such as other mammals may be used to express humanized antibodies.
  • phage display technology can be used to identify antibodies and heteromeric Fab fragments that specifically bind to selected antigens (see, e.g., McCafferty et al, supra; Marks et al, Biotechnology 10:779-783, 1992).
  • a "chimeric antibody” is an antibody molecule in which (a) the constant region, or a portion thereof, is altered, replaced or exchanged so that the antigen binding site (variable region) is linked to a constant region of a different or altered class, effector function and/or species, or an entirely different molecule which confers new properties to the chimeric antibody, e.g., an enzyme, toxin, hormone, growth factor, drug, etc.; or (b) the variable region, or a portion thereof, is altered, replaced or exchanged with a variable region having a different or altered antigen specificity.
  • an "anti-OPN antibody” is an antibody or antibody fragment that specifically binds a polypeptide encoded by the OPN gene, cDNA, or a subsequence thereof.
  • An anti-EpCAM antibody is defined in a similar fashion.
  • a "receptor” as used herein encompasses any molecule that a particular protein, e.g., OPN or EpCAM, can specifically bind and may thus include proteins, nucleic acids, carbohydrates, or any other molecules.
  • the term "immunoassay” is an assay that uses an antibody to specifically bind an antigen. The immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen.
  • the specified antibodies bind to a particular protein at least two times the background and do not substantially bind in a significant amount to other proteins present in the sample.
  • Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein.
  • polyclonal antibodies raised to OPN from specific species such as rat, murine, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with OPN and not with other proteins, except for polymorphic variants and alleles of OPN. This selection may be achieved by subtracting out antibodies that cross-react with OPN molecules from other species.
  • a variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein.
  • solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual, 1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity).
  • a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background.
  • the phrase "differentially present” refers to differences in the quantity and/or the frequency of a marker present in a sample taken from a metastatic HCC tumor or liver samples of a patient at high risk for HCC as compared to a non-metastatic HCC sample or a liver sample from a patient at low risk for HCC respectively.
  • a marker can be a polypeptide or nucleic acid which is present at an elevated level or at a decreased level in samples of metastatic HCC tumors or liver samples of someone at high risk for HCC compared to non-metastatic HCC samples or a liver sample from a patient at low risk for HCC respectively.
  • a marker can be a polypeptide which is detected at a higher frequency or at a lower frequency in metastatic HCC tumors or liver samples of someone at high risk for HCC compared to non-metastatic HCC sample or a liver sample from a patient at low risk for HCC respectively.
  • a marker can be differentially present in terms of quantity, frequency or both.
  • a polypeptide or nucleic acid is differentially present between the two samples if the amount ofthe polypeptide in one sample is statistically significantly different from the amount ofthe polypeptide in the other sample.
  • a polypeptide is differentially present between the two samples if it is present at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900%, or at least about 1000% greater than it is present in the other sample, or if it is detectable in one sample and not detectable in the other.
  • a polypeptide is differentially present between the two sets of samples if the frequency of detecting the polypeptide in the metastatic HCC tumors or liver samples of someone at high risk for HCC is statistically significantly higher or lower than in non-metastatic HCC samples or a liver sample from a patient at low risk for HCC respectively.
  • a polypeptide is differentially present between the two sets of samples if it is detected at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900%, or at least about 1000% more frequently or less frequently observed in one set of samples than the other set of samples.
  • Diagnostic means identifying the presence or nature of a pathologic condition or a predisposition for a pathologic condition such as HCC or HCC metastasis. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives").
  • a "test amount” of a marker refers to an amount of a marker present in a sample being tested.
  • a test amount can be either in absolute amount (e.g., ⁇ g/ml) or a relative amount (e.g., relative intensity of signals).
  • a “diagnostic amount” of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of metastatic HCC tumors or tissue samples of someone at high risk for HCC.
  • a diagnostic amount can be either in absolute amount (e.g., ⁇ g/ml) or a relative amount (e.g., relative intensity of signals).
  • a "control amount" of a marker can be any amount or a range of amount which is to be compared against a test amount of a marker.
  • a control amount of a marker can be the amount of a marker in a person without metastatic HCC tumors or tissue samples of someone at low risk for HCC.
  • a control amount can be either in absolute amount (e.g., ⁇ g/ml) or a relative amount (e.g., relative intensity of signals).
  • Spectrometer probe refers to a device that is removably insertable into a gas phase ion spectrometer and comprises a substrate having a surface for presenting a marker for detection.
  • a spectrometer probe can comprise a single substrate or a plurality of substrates.
  • Terms such as ProteinChip ® , ProteinChip ® array, or chip are also used herein to refer to specific kinds of spectrometer probes.
  • Substrate or “probe substrate” refers to a solid phase onto which an adsorbent can be provided (e.g., by attachment, deposition, etc.).
  • Adsorbent refers to any material capable of adsorbing a marker. The term
  • adsorbent is used herein to refer both to a single material (“monoplex adsorbent”) (e.g., a compound or functional group) to which the marker is exposed, and to a plurality of different materials (“multiplex adsorbent”) to which the marker is exposed.
  • the adsorbent materials in a multiplex adsorbent are referred to as "adsorbent species.”
  • an addressable location on a probe substrate can comprise a multiplex adsorbent characterized by many different adsorbent species (e.g., anion exchange materials, metal chelators, or antibodies), having different binding characteristics.
  • Substrate material itself can also contribute to adsorbing a marker and may be considered part of an "adsorbent.”
  • Adsorption or “retention” refers to the detectable binding between an absorbent and a marker either before or after washing with an eluant (selectivity threshold modifier) or a washing solution.
  • Eluant or "washing solution” refers to an agent that can be used to mediate adsorption of a marker to an adsorbent. Eluants and washing solutions are also referred to as “selectivity threshold modifiers.” Eluants and washing solutions can be used to wash and remove unbound materials from the probe substrate surface.
  • Resolution refers to the detection of at least one marker in a sample. Resolution includes the detection of a plurality of markers in a sample by separation and subsequent differential detection. Resolution does not require the complete separation of one or more markers from all other biomolecules in a mixture. Rather, any separation that allows the distinction between at least one marker and other biomolecules suffices.
  • Gas phase ion spectrometer refers to an apparatus that measures a parameter which can be translated into mass-to-charge ratios of ions formed when a sample is volatilized and ionized. Generally ions of interest bear a single charge, and mass-to-charge ratios are often simply referred to as mass. Gas phase ion spectrometers include, for example, mass spectrometers, ion mobility spectrometers, and total ion current measuring devices.
  • Mass spectrometer refers to a gas phase ion spectrometer that includes an inlet system, an ionization source, an ion optic assembly, a mass analyzer, and a detector.
  • Laser desorption mass spectrometer refers to a mass spectrometer which uses laser as means to desorb, volatilize, and ionize an analyte.
  • Detect refers to identifying the presence, absence, or amount ofthe object to be detected.
  • Detectable moiety refers to a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, or chemical means.
  • useful labels include 32 P, 35 S, fluorescent dyes, electron-dense reagents, enzymes (such as those commonly used in an ELISA, e.g. , horseradish peroxidase), biotin- streptavidin, digoxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target.
  • the detectable moiety often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound detectable moiety in a sample. Quantitation ofthe signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry.
  • activity refers to the biological functions of a molecule, such as a protein encoded by a gene of interest, e.g., osteopontin or EpCAM. This term encompasses biological functions such as enzymatic activity, specific interaction with other molecules, regulatory effects on biological events at molecular or cellular level, and the like.
  • inhibitors refers to a negative regulatory effect on the function or activity of an intended target molecule, such that the function or activity, e.g., enzymatic activity or specific interaction with other molecules, is detectably diminished or effectively abolished.
  • antagonist refers to a compound that is capable of negatively regulating the biological activity of a target molecule, e.g., osteopontin or EpCAM.
  • An antagonist may effectuate the negative regulation by various means, such as by suppression ofthe expression ofthe target gene at transcriptional or translational level, or by interfering with the target molecule in its specific interaction with other molecules.
  • antisense refers to a single-stranded nucleic acid having a nucleotide sequence complementary to at least a portion of a target nucleic acid that encodes a protein of interest (e.g., osteopontin, or EpCAM), or the "sense" sequence.
  • a protein of interest e.g., osteopontin, or EpCAM
  • Complementarity between two single-stranded polynucleotides is based on the "A-T G-C" base-pairing rule. For example, the sequence “5'- AGAT-3',” is complementary to the sequence "5'-ATCT-3"'.
  • Complementarity between a target nucleic acid and its antisense polynucleotide is typically 100%, i.e., all bases ofthe antisense polynucleotide match the with the bases ofthe target nucleic acid, but may be of varying degrees, i.e., there are may be some mis-matched bases.
  • the degree of complementarity between a target nucleic acid and its antisense polynucleotide has significant effects on the efficiency and strength of hybridization.
  • An "antisense" polynucleotide sequence in the present application may correspond to a coding portion (i.e., exon) or a non-coding portion (i.e., intron) ofthe target nucleic acid.
  • Figure 1 Classification of hepatocellular carcinoma with or without metastasis by gene expression.
  • P primary HCC with intra-hepatic spreads
  • P-M metastatic lesion of P
  • PT primary HCC with tumor thrombus in portal vein
  • PN metastasis-free primary HCC samples.
  • Figure 2 Prediction of metastasis and survival with metastasis predictor model derived from "leave-one-out' cross-validated compound covariate predictor classification.
  • FIG. 1 Candidate genes associated with metastatic HCC.
  • a monoclonal ⁇ -actin antibody was used as internal control. Densitometry was used to quantify the amount of OPN, which was normalized to actin. OPN level is indicated as relative folds.
  • B) CCL13, SK-Hep-1 or Hep3B cells were incubated with or without a murine recombinant osteopontin protein or a neutralizing antibody against osteopontin and their invasiveness was determined by the Matrigel Basement Membrane Cell Invasion Chamber. Data is an average of triplicate determinants for each condition and is expressed as the mean percent invasion (plus one standard deviation) through the Matrigel Matrix and membrane (matrigel chamber) relative to the migration through the control membrane (control chamber).
  • HCCLM3 cells without (upper panel) or with (bottom panel) anti- OPN neutralizing antibody are shown. Arrows indicate the tumor grades.
  • EpCAM expression in cells from normal human fibroblasts (NHF-hTERT), normal liver (CCL13) and hepatoma (SK-Hep-1, Hep3B, Huhl, Huh4, Huh7, and HepG2) was analyzed by western blotting with a monoclonal antibody against EpCAM.
  • a monoclonal antibody against beta-actin was used as an internal control
  • Cell proliferation of Hep3B, Huhl, and Huh4 cells was determined by MTT assay and data were an average of 3 independent experiments
  • HCC Hepatocellular carcinoma
  • HCC patients are incurable due to their poor prognosis. Although routine screening of individuals who are at the risk for developing HCC may provide an opportunity for some patients with an extended life, many patients are still diagnosed with advanced HCC with little improved survival (see, e.g. , Yang et al, J. Cancer Res. Clin. Oncol. 123:357-360, 1997; Izzo et al., Ann. Surg. 227:513-518, 1998). While a small subset of HCC patients qualifies for surgical intervention, the improvement on long- term survival is only modest.
  • HCC extremely poor prognosis
  • a high rate of recu ⁇ ence after surgery or intra-hepatic metastases that develop by invasion of the portal vein or spreading to other parts of the liver, whereas extrahepatic metastases are less common (see, e.g., Genda et al., Hepatology 30:1027-1036, 1999).
  • These data indicate that the liver is the main target organ of HCC metastasis.
  • the portal vein is the main route for intrahepatic metastases of metastatic HCC cells (see, e.g., Mitsunobu et al., Clin. Exp. Metastasis 14:520- 529, 1996).
  • This specific feature of HCC underscores the need to develop an accurate molecular profiling model for better diagnosis and therapeutic targets for the treatment of HCC patients with intrahepatic metastases.
  • osteopontin both as a molecular marker for defining HCC patients with metastatic potential and as a potential therapeutic target for treating metastatic HCC.
  • a similar approach is used to develop a gene expression prediction model for the potential to develop HCC in patients with chronic liver diseases. By comparing the gene expression profiles of patients epidemiologically at high risk for developing HCC with the gene expression profile of patients epidemiologically at low risk for developing HCC, cellular markers are identified so as to allow the identification of individuals with chronic liver diseases at high risk for developing HCC.
  • the patients with severe liver diseases include those diagnosed with chronic hepatitis B infection, hepatitis C infection, hemochromatosis, Wilson's disease, alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis.
  • High risk precancerous diseases include chronic hepatitis B infection, hepatitis C infection, hemochromatosis, and Wilson's disease.
  • Low risk precancerous diseases include alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis.
  • EpCAM One gene identified to be associated with elevated risk of developing HCC in patients with severe liver diseases. Growth suppression of liver cancer cells has been observed upon inhibition of EpCAM expression, identifying its important role in HCC development and as a therapeutic target for preventing HCC in patients with chronic liver diseases.
  • One particular aspect ofthe invention provides methods for clustering co-regulated genes in patients suspected of having metastatic HCC or the potential to develop HCC into gene expression profiles. This section provides a more detailed discussion of methods for clustering co-regulated genes.
  • a preferred embodiment for identifying such basis gene expression profiles involves clustering algorithms (for reviews of clustering algorithms, see, e.g., Fukunaga, 1990, Statistical Pattern Recognition, 2nd Ed., Academic Press, San Diego; Everitt, 1974, Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms, New York: Wiley; Sneath and Sokal, 1973, Numerical Taxonomy, Freeman; Anderberg, 1973, Cluster Analysis for Applications, Academic Press: New York).
  • clustering algorithms for reviews of clustering algorithms, see, e.g., Fukunaga, 1990, Statistical Pattern Recognition, 2nd Ed., Academic Press, San Diego; Everitt, 1974, Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms, New York: Wiley; Sneath and Sokal, 1973, Numerical Taxonomy, Freeman; Anderberg, 1973, Cluster Analysis for Applications, Academic Press: New York).
  • cluster analysis In some embodiments employing cluster analysis, the expression of a large number of genes is monitored in biological samples obtained from different sources A table of data containing the gene expression measurements is used for cluster analysis.
  • Cluster analysis operates on a table of data which has the dimension m x k wherein m is the total number of conditions or perturbations and k is the number of genes measured.
  • a number of clustering algorithms are useful for clustering analysis.
  • Clustering algorithms use dissimilarities or distances between objects when forming clusters.
  • the distance used is Euclidean distance in multidimensional space.
  • the Euclidean distance may be squared to place progressively greater weight on objects that are further apart.
  • the distance measure may be the Manhattan distance.
  • unsupervised hierarchical clustering of a table of data may be performed using the CLUSTER or TREEVIEW software (Eisen et al., Proc. Natl. Acad. Sci. U.S.A. 95: 14863- 14868, 1998) using median centered correlation and complete linkage.
  • Various cluster linkage rules are useful for the methods ofthe invention.
  • Single linkage a nearest neighbor method, determines the distance between the two closest objects.
  • complete linkage methods determine distance by the greatest distance between any two objects in the different clusters. This method is particularly useful in cases when genes or other cellular constituents form naturally distinct "clumps.”
  • the unweighted pair-group average defines distance as the average distance between all pairs of objects in two different clusters. This method is also very useful for clustering genes or other cellular constituents to form naturally distinct "clumps.”
  • the weighted pair-group average method may also be used. This method is the same as the unweighted pair-group average method except that the size ofthe respective clusters is used as a weight.
  • This method is particularly useful for embodiments where the cluster size is suspected to be greatly varied (Sneath and Sokal, 1973, Numerical taxonomy, San Francisco. W. H. Freeman & Co.).
  • Other cluster linkage rules such as the unweighted and weighted pair-group centroid and Ward's method are also useful for some embodiments ofthe invention. See., e g, Ward, 1963, J. Am. StatAssn. 58:236; Hartigan, 1975, Clustering algorithms, New York: Wiley.
  • the cluster analysis used is the BRB- ArrayTools software, an integrated package for the visualization and statistical analysis of cDNA microarray gene expression data developed by the Biometric Research Branch ofthe National Cancer Institute, for both unsupervised and supervised analyses.
  • the Class Comparison Tool based on univariate F-tests may be used to find genes differentially expressed between predefined clinical groups at a significance level of E ⁇ 0.001 or 0.002.
  • the permutation distribution ofthe F-statistic, based on 2000 random permutations may also used to confirm statistical significance.
  • the multi-variate Compound Covariate Predictor (CCP) Tool with a "leave-one-out" cross-validation test using 2000 random permutations at a significant level of EO.001 may be used to classify predefined clinical groups based on their gene expression profiles.
  • CCP Compound Covariate Predictor
  • the statistical significance ofthe cross- validated misclassification rate is determined by repeating the entire cross-validation procedure to data with the class membership labels randomly permuted 2000 times.
  • the CCP is based on a weighted linear combination of gene expression variables that are univariately significant in the training set with the weights being the corresponding t-statistics as described in Radmacher et al., Journal of Computational Biology, in press, 2002.
  • An example of a clustering "tree" output is shown in Figures 1 and 3 (see, also, Example 1, infra).
  • Gene expression profiles may be defined based on the many smaller branches in the tree, or a small number of larger branches by cutting across the tree at different levels. The choice of cut level may be made to match the number of distinct clinical groups expected. If little or no prior information is available about the number of groups, then the tree should be divided into as many branches as are truly distinct. 'Truly distinct” may be defined by a minimum distance value between the individual branches. This distance is the vertical coordinate ofthe horizontal connector joining two branches (see Figure IB). Typical values are in the range 0.2 to 0.4 where 0 is perfect correlation and 1 is zero correlation, but may be larger for poorer quality data or fewer experiments in the training set, or smaller in the case of better data and more experiments in the training set.
  • "truly distinct” may be defined with an objective test of statistical significance for each bifurcation in the tree.
  • the Compound Covariat Predictor (CCP) tool with "leave one out” cross-validation test using 2000 random permutations at a predefined significant level is used to define an objective test.
  • the distribution of fractional improvements obtained from the CCP procedure is an estimate of the distribution under the null hypothesis that a particular classification is correct or incorrect.
  • Another aspect ofthe cluster analysis method of this invention provides the definition of basis vectors for use in profile projection described in the following sections.
  • genes involved in a regulatory pathway provides useful information for designing and screening new drugs.
  • drug candidates are screened for their therapeutic activity.
  • desired drug activity is to affect one particular genetic regulatory pathway.
  • drug candidates are screened for their ability to affect the gene expression profile corresponding to the regulatory pathway.
  • a new drug is desired to replace an existing drug.
  • the projected profiles of drug candidates are compared with that ofthe existing drug to determine which drug candidate has activities similar to the existing drug.
  • the method ofthe invention is used to decipher pathway arborization and kinetics.
  • a receptor When a receptor is triggered (or blocked) by a ligand, the excitation ofthe downstream pathways can be different depending on the exact temporal profile and molecular domains ofthe ligand interaction with the receptor.
  • Simple examples ofthe differing effects of different ligands are the phenotypical differences that arise between responses to agonists, partial agonists, negative antagonists, and antagonists, and that are expected to occur in response to covalent vs. noncovalent binding and activation of different molecular domains on the receptor. See, Ross, Pharmacodynamics: Mechanisms of Drug
  • FIG. 4A illustrates two different possible responses of a pathway cascade.
  • receptors for ligands such as OPN may be investigated using the projection method ofthe invention to simplify the observed temporal responses to receptor/ligand interactions over the responding genes.
  • the gene expression profiles and temporal profiles involved are discovered.
  • the profile of temporal responses of a large number of genes are projected onto the predefined gene expression profiles to obtain a projected profile of temporal responses.
  • the projection process simplifies the observed responses so that different temporal responses may be detected and discriminated more accurately.
  • One aspect ofthe invention provides methods for diagnosing diseases of humans, animals and plants. Those methods are also useful for monitoring the progression of diseases and the effectiveness of treatments.
  • a patient cell sample such as a biopsy from a patient's diseased tissue such as metastatic HCC, is assayed for the expression of a large number of genes.
  • the gene expression profile is projected into a profile of gene expression profile expression values according to a definition of gene expression profiles.
  • the projected profile is then compared with a reference database containing reference projected profiles. If the projected profile ofthe patient matches best with a cancer profile in the database, the patient's diseased tissue is diagnosed as being cancerous. Similarly, when the best match is to a profile of another disease or disorder, a diagnosis of such other disease or disorder is made.
  • a tissue sample is obtained from a patient's tumor.
  • the tissue sample is assayed for the expression of a large number of genes of interest.
  • the gene expression profile is projected into a profile of gene expression profile expression values according to a definition of gene expression profiles.
  • the projected profile is compared with projected profiles previously obtained from the same tumor to identify the change of expression in gene expression profiles.
  • a reference library is used to determine whether the gene expression profile changes indicate tumor progression such as metastasis.
  • a similar method is used to stage other diseases and disorders. Changes of gene expression profile expression values in a profile obtained from a patient under treatment can be used to monitor the effectiveness ofthe treatment, for example, by comparing the projected profile prior to treatment with that after treatment.
  • kits for determining the responses or state of a biological sample contain microarrays, such as those described in subsections below.
  • the microarrays contained in such kits comprise a solid phase, e.g., a surface, to which probes are hybridized or bound at a known location ofthe solid phase.
  • these probes consist of nucleic acids of known, different sequence, with each nucleic acid being capable of hybridizing to an RNA species or to a cDNA species derived therefrom.
  • the probes contained in the kits of this invention are nucleic acids capable of hybridizing specifically to nucleic acid sequences derived from RNA species which are known to increase or decrease in response to perturbations to the particular protein whose activity is determined by the kit.
  • the probes contained in the kits of this invention preferably substantially exclude nucleic acids which hybridize to RNA species that are not increased in response to perturbations to the particular protein whose activity is determined by the kit, such as osteopontin.
  • kits ofthe invention also contains a database of gene expression profile definitions such as the databases described above or an access authorization to use the database described above from a remote networked computer.
  • a kit ofthe invention further contains expression profile projection and analysis software capable of being loaded into the memory of a computer system such as the one described supra in the subsection, and illustrated in Example 1.
  • the expression profile analysis software contained in the kit of this invention is essentially identical to the expression profile analysis software described above in Example 1.
  • This invention is particularly useful for the analysis of gene expression profiles.
  • One aspect ofthe invention provides methods for defining co-regulated gene expression profiles based upon the co ⁇ elation of gene expression. Some embodiments of this invention are based on measuring the transcriptional rate of genes.
  • the transcriptional rate can be measured by techniques of hybridization to a ⁇ ays of nucleic acid or nucleic acid mimic probes, described in the next section, or by other gene expression technologies, such as those described in the subsequent subsection. However measured, the result is either the absolute, relative amounts of transcripts or response data including values representing RNA abundance ratios, which usually reflect DNA expression ratios (in the absence of differences in RNA degradation rates).
  • aspects ofthe biological state other than the transcriptional state such as the translational state, the activity state, or mixed aspects can be measured.
  • measurement ofthe transcriptional state is made by hybridization to DNA microa ⁇ ays, which are described in this section. Certain other methods of transcriptional state measurement are described later in this subsection.
  • DNA microa ⁇ ays can be employed for analyzing the transcriptional state in a biological sample and especially for measuring the transcriptional states of a biological sample exposed to graded levels of a drug of interest or to graded perturbations to a biological pathway of interest.
  • DNA microa ⁇ ays are produced by hybridizing detectably labeled polynucleotides representing the mRNA transcripts present in a cell (e.g., fluorescently labeled cDNA synthesized from total cell mRNA) to a microa ⁇ ay.
  • a microa ⁇ ay is a surface with an ordered array of binding (e.g., hybridization) sites for products of many ofthe genes in the genome of a cell or organism, preferably most or almost all ofthe genes.
  • Microa ⁇ ays can be made in a number of ways, of which several are described below.
  • microa ⁇ ays share certain prefe ⁇ ed characteristics:
  • the a ⁇ ays are reproducible, allowing multiple copies of a given a ⁇ ay to be produced and easily compared with each other.
  • the microa ⁇ ays are small, usually smaller than 5 2 cm, and they are made from materials that are stable under binding (e.g., nucleic acid hybridization) conditions.
  • a given binding site or unique set of binding sites in the microa ⁇ ay will specifically bind the product of a single gene in the cell.
  • site physical binding site
  • cDNA complementary to the total cellular mRNA when detectably labeled (e.g., with a fluorophore) cDNA complementary to the total cellular mRNA is hybridized to a microarray, the site on the a ⁇ ay co ⁇ esponding to a gene (i.e., capable of specifically binding the product ofthe gene) that is not transcribed in the cell will have little or no signal (e.g., fluorescent signal), and a gene for which the encoded mRNA is prevalent will have a relatively strong signal.
  • a gene for which the encoded mRNA is prevalent when detectably labeled (e.g., with a fluorophore) cDNA complementary to the total cellular mRNA is hybridized to a microarray, the site on the a ⁇ ay co ⁇ esponding to a gene (i.e., capable of specifically binding the product ofthe gene) that is not transcribed in the cell will have little or no signal (e.g., fluorescent signal), and
  • cDNAs from two different cells are hybridized to the binding sites ofthe microarray.
  • drug responses one biological sample is exposed to a drug and another biological sample ofthe same type is not exposed to the drug.
  • pathway responses one cell is exposed to a pathway perturbation and another cell ofthe same type is not exposed to the pathway perturbation.
  • the cDNA derived from each ofthe two cell types are differently labeled so that they can be distinguished.
  • cDNA from a cell treated with a drug is synthesized using a fluorescein-labeled dNTP
  • cDNA from a second cell, not drug-exposed is synthesized using a rhodamine-labeled dNTP.
  • the cDNA from the drug-treated (or pathway perturbed) cell will fluoresce green when the fluorophore is stimulated and the cDNA from the untreated cell will fluoresce red.
  • the drug treatment has no effect, either directly or indirectly, on the relative abundance of a particular mRNA in a cell
  • the mRNA will be equally prevalent in both cells and, upon reverse transcription, red-labeled and green- labeled cDNA will be equally prevalent.
  • the binding site(s) for that species of RNA will emit wavelengths characteristic of both fluorophores (and appear brown in combination).
  • the drug-exposed cell is treated with a drug that, directly or indirectly, increases the prevalence ofthe mRNA in the cell, the ratio of green to red fluorescence will increase. When the drug decrease the mRNA prevalence, the ratio will decrease.
  • cDNA from a single cell, and compare, for example, the absolute amount of a particular mRNA in, e.g., a drug-treated or pathway- perturbed cell and an untreated cell.
  • Microa ⁇ ays are known in the art and consist of a surface to which probes that co ⁇ espond in sequence to gene products (e.g., cDNAs, mRNAs, cRNAs, polypeptides, and fragments thereof), can be specifically hybridized or bound at a known position.
  • the microarray is an a ⁇ ay (i.e., a matrix) in which each position represents a discrete binding site for a product encoded by a gene (e.g., a protein or RNA), and in which binding sites are present for products of most or almost all ofthe genes in the organism's genome.
  • the "binding site” is a nucleic acid or nucleic acid analogue to which a particular cognate cDNA can specifically hybridize.
  • the nucleic acid or analogue ofthe binding site can be, e.g., a synthetic oligomer, a full-length cDNA, a less-than full length cDNA, or a gene fragment.
  • the microa ⁇ ay contains binding sites for products of all or almost all genes in the target organism's genome, such comprehensiveness is not necessarily required.
  • the microa ⁇ ay will have binding sites co ⁇ esponding to at least about 50% ofthe genes in the genome, often at least about 75%, more often at least about 85%, even more often more than about 90%, and most often at least about 99%.
  • the microarray has binding sites for genes relevant to the action of a drug of interest or in a biological pathway of interest.
  • a “gene” is identified as an open reading frame (ORF) of preferably at least 50, 75, or 99 amino acids from which a messenger RNA is transcribed in the organism (e.g., if a single cell) or in some cell in a multicellular organism.
  • ORF open reading frame
  • the number of genes in a genome can be estimated from the number of mRNAs expressed by the organism, or by extrapolation from, a well-characterized portion ofthe genome.
  • the number of ORFs can be determined and mRNA coding regions identified by analysis ofthe DNA sequence. For example, the Saccharomyces cerevisiae genome has been completely sequenced and is reported to have approximately 6275 open reading frames (ORFs) longer than 99 amino acids.
  • ORFs there are 5885 ORFs that are likely to specify protein products (Goffeau et al., 1996, Life with 6000 genes, Science 274:546-567, which is incorporated by reference in its entirety for all purposes). In contrast, the human genome is estimated to contain approximately 5xl0 4 genes.
  • the "binding site" to which a particular cognate cDNA specifically hybridizes is usually a nucleic acid or nucleic acid analogue attached at that binding site.
  • the binding sites ofthe microa ⁇ ay are DNA polynucleotides co ⁇ esponding to at least a portion of each gene in an organism's genome. These DNAs can be obtained by, e.g., polymerase chain reaction (PCR) amplification of gene segments from genomic DNA, cDNA (e.g., by RT-PCR), or cloned sequences.
  • PCR polymerase chain reaction
  • PCR primers are chosen, based on the known sequence ofthe genes or cDNA, that result in amplification of unique fragments (i.e., fragments that do not share more than 10 bases of contiguous identical sequence with any other fragment on the microa ⁇ ay).
  • Computer programs are useful in the design of primers with the required specificity and optimal amplification properties. See, e.g., Oligo version 5.0 (National Biosciences).
  • Oligo version 5.0 National Biosciences
  • each gene fragment on the microa ⁇ ay will be between about 50 bp and about 2000 bp, more typically between about 100 bp and about 1000 bp, and usually between about 300 bp and about 800 bp in length.
  • PCR methods are well known and are described, for example, in Innis et al. eds., 1990, PCR Protocols: A Guide to Methods and Applications, Academic Press Inc., San Diego, Calif, which is incorporated by reference in its entirety for all purposes. It will be apparent that computer controlled robotic systems are useful for isolating and amplifying nucleic acids.
  • nucleic acid for the microa ⁇ ay is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N-phosphonate or phosphoramidite chemistries (Froehler et al., 1986, Nucleic Acid, Res 14:5399-5407; McBride et al., 1983, Tetrahedron Lett. 24:245-248). Synthetic sequences are between about 15 and about 500 bases in length, more typically between about 20 and about 50 bases. In some embodiments, synthetic nucleic acids include non-natural bases, e.g., inosine.
  • nucleic acid analogues may be used as binding sites for hybridization.
  • An example of a suitable nucleic acid analogue is peptide nucleic acid (see, e.g., Egholm et al., 1993, PNA hybridizes to complementary oligonucleotides obeying the Watson-Crick hydrogen-bonding rules, Nature 365:566-568; see also U.S. Pat. No. 5,539,083).
  • the binding (hybridization) sites are made from plasmid or phage clones of genes, cDNAs (e.g., expressed sequence tags), or inserts therefrom (Nguyen et al., 1995, Differential gene expression in the murine thymus assayed by quantitative hybridization of a ⁇ ayed cDNA clones, Genomics 29:207-209).
  • the polynucleotide ofthe binding sites is RNA.
  • the nucleic acid or analogue are attached to a solid support, which may be made from glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, or other materials.
  • a prefe ⁇ ed method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al., 1995, Quantitative monitoring of gene expression patterns with a complementary DNA microa ⁇ ay, Science 270:467-470. This method is especially useful for preparing microa ⁇ ays of cDNA.
  • a second preferred method for making microa ⁇ ays is by making high-density oligonucleotide arrays.
  • Techniques are known for producing a ⁇ ays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface using photolithographic techniques for synthesis in situ (see, Fodor et al., 1991, Light-directed spatially addressable parallel chemical synthesis, Science 251 :767-773; Pease et al., 1994,
  • oligonucleotide probes can be chosen to detect alternatively spliced mRNAs or to serve as various type of control.
  • microa ⁇ ays Another preferred method of making microa ⁇ ays is by use of an inkjet printing process to synthesize oligonucleotides directly on a solid phase.
  • microa ⁇ ays e.g., by masking
  • any type of array for example, dot blots on a nylon hybridization membrane (see Sambrook and Russell,
  • RNA is extracted from biological samples ofthe various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al., 1979, Biochemistry 18:5294-5299).
  • total RNA may be extracted from samples using TRIzol reagent (Life Technologies) according to manufacturer's directions.
  • Poly(A)+ RNA is selected by selection with oligo-dT cellulose (see Sambrook and Russell, supra).
  • Biological samples of interest include normal liver samples, non-cancerous liver samples and samples from defined clinical specimens.
  • Labeled cDNA is prepared from mRNA by oligo dT-primed or random-primed reverse transcription, both of which are well known in the art (see, e.g., Klug and Berger, 1987, Methods Enzymol. 152:316-325). Reverse transcription may be carried out in the presence of a dNTP conjugated to a detectable label, most preferably a fluorescently labeled dNTP.
  • isolated mRNA can be converted to labeled antisense RNA synthesized by in vitro transcription of double-stranded cDNA in the presence of labeled dNTPs (Lockhart et al., 1996, Expression monitoring by hybridization to high-density oligonucleotide a ⁇ ays, Nature Biotech. 14:1675, which is incorporated by reference in its entirety for all purposes).
  • the cDNA or RNA probe can be synthesized in the absence of detectable label and may be labeled subsequently, e.g., by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo-cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent.
  • labeled streptavidin e.g., phycoerythrin-conjugated streptavidin
  • fluorophores include fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others (see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, Calif). It will be appreciated that pairs of fluorophores are chosen that have distinct emission spectra so that they can be easily distinguished.
  • a label other than a fluorescent label is used.
  • a radioactive label or a pair of radioactive labels with distinct emission spectra, can be used (see Zhao et al., 1995, High density cDNA filter analysis: a novel approach for large-scale, quantitative analysis of gene expression, Gene 156:207; Pietu et al., 1996, Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA a ⁇ ay, Genome Res. 6:492).
  • use of radioisotopes is a less-prefe ⁇ ed embodiment.
  • labeled cDNA is synthesized by incubating a mixture containing 0.5 mM dGTP, dATP and dCTP plus 0.1 mM dTTP plus fluorescent deoxyribonucleotides (e.g., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)) with reverse transcriptase (e.g., SuperScript.TM.il, LTI Inc.) at 42°C for 60 minutes.
  • fluorescent deoxyribonucleotides e.g., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)
  • reverse transcriptase e.g., SuperScript.TM.il, LTI Inc.
  • nucleic acid hybridization and wash conditions are optimally chosen so that the probe "specifically binds” or “specifically hybridizes” to a specific array site, i.e., the probe hybridizes, duplexes or binds to a sequence a ⁇ ay site with a complementary nucleic acid sequence but does not hybridize to a site with a non-complementary nucleic acid sequence.
  • one polynucleotide sequence is considered complementary to another when, if the shorter of the polynucleotides is less than or equal to 25 bases, there are no mismatches using standard base-pairing rules or, if the shorter ofthe polynucleotides is longer than 25 bases, there is no more than a 5% mismatch.
  • the polynucleotides are perfectly complementary (no mismatches). It can easily be demonstrated that specific hybridization conditions result in specific hybridization by ca ⁇ ying out a hybridization assay including negative controls (see, e.g., Shalon et al., supra, and Chee et al., supra).
  • Optimal hybridization conditions will depend on the length (e.g., oligomer versus polynucleotide greater than 200 bases) and type (e.g., RNA, DNA, PNA) of labeled probe and immobilized polynucleotide or oligonucleotide.
  • length e.g., oligomer versus polynucleotide greater than 200 bases
  • type e.g., RNA, DNA, PNA
  • hybridization conditions are hybridization in 5xSSC plus 0.2% SDS at 65°C. for 4 hours followed by washes at 25°C. in low stringency wash buffer (lxSSC plus 0.2% SDS) followed by 10 minutes at 25°C. in high stringency wash buffer (O.lxSSC plus 0.2% SDS) (Shena et al., 1996, Proc. Natl. Acad. Sci. USA, 93:10614).
  • Useful hybridization conditions are also provided in, e.g., Tijessen, 1993, Hybridization With Nucleic Acid Probes, Elsevier Science Publishers B. V. and Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, Calif. 7. Signal Detection and Data Analysis
  • the fluorescence emissions at each site of a transcript a ⁇ ay can be detected by scanning confocal laser microscopy.
  • the fluorescent intensities are measured by the Axon GenePix 4000 scanner.
  • a separate scan, using the appropriate excitation line, is carried out for each ofthe two fluorophores used.
  • a laser can be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (see Shalon et al., 1996, A DNA microa ⁇ ay system for analyzing complex DNA samples using two-color fluorescent probe hybridization, Genome Research 6:639-645, which is incorporated by reference in its entirety for all purposes).
  • the a ⁇ ays are scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective. Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser and the emitted light is split by wavelength and detected with two photomultiplier tubes.
  • Fluorescence laser scanning devices are described in Schena et al., 1996, Genome Res. 6:639-645 and in other references cited herein.
  • the fiber-optic bundle described by Ferguson et al., 1996, Nature Biotech. 14:1681-1684 may be used to monitor mRNA abundance levels at a large number of sites simultaneously.
  • Signals are recorded and, in a prefe ⁇ ed embodiment, analyzed by computer, e.g., using a 12 bit analog to digital board.
  • the scanned image is despeckled using a graphics program (e.g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet ofthe average hybridization at each wavelength at each site. If necessary, an experimentally determined co ⁇ ection for "cross talk" (or overlap) between the channels for the two fluors may be made.
  • the fluorescent intensities were analyzed by the GenePix Pro 3.0 software to subtract the background signals.
  • the expression data were then filtered based on their channel intensities, spots size and flag (missing data) , and the Cy5/Cy3 ratios were calculated and normalized by median-centering the log-ratio of all genes in each a ⁇ ay. For any particular hybridization site on the transcript a ⁇ ay, a ratio ofthe emission ofthe two fluorophores can be calculated. The ratio is independent ofthe absolute expression level ofthe cognate gene, but is useful for genes whose expression is significantly modulated by drug administration, gene deletion, or any other tested event.
  • the relative abundance of an mRNA in two biological samples is scored as a perturbation and its magnitude determined (i.e., the abundance is different in the two sources of mRNA tested), or as not perturbed (i.e., the relative abundance is the same).
  • a difference between the two sources of RNA of at least a factor of about 25% RNA from one source is 25% more abundant in one source than the other source), more usually about 50%, even more often by a factor of about 2 (twice as abundant), 3 (three times as abundant) or 5 (five times as abundant) is scored as a perturbation.
  • a perturbation in addition to identifying a perturbation as positive or negative, it is advantageous to determine the magnitude ofthe perturbation. This can be carried out, as noted above, by calculating the ratio of the emission ofthe two fluorophores used for differential labeling, or by analogous methods that will be readily apparent to those of skill in the art.
  • gene expression profiles are determined by observing the gene expression profile of clinical sample of interest.
  • DNA microarrays reflecting the transcriptional state of a biological sample of interest are made by hybridizing a mixture of two differently labeled probes each corresponding (i.e., complementary) to the mRNA of a clinical sample of interest or a reference sample, to the microa ⁇ ay.
  • the two samples are ofthe same type, i.e., ofthe same species and tissue type, but may differ in clinical diagnosis.
  • the genes whose expression are highly co ⁇ elated may belong to a gene expression profile.
  • the transcriptional state of a cell may be measured by other gene expression technologies known in the art.
  • Several such technologies produce pools of restriction fragments of limited complexity for electrophoretic analysis, such as methods combining double restriction enzyme digestion with phasing primers (see, e.g., European Patent O 534858 Al, filed Sep. 24, 1992, by Zabeau et al.), or methods selecting restriction fragments with sites closest to a defined mRNA end (see, e.g., Prashar et al., 1996, Proc. Natl. Acad. Sci. USA 93:659-663).
  • cDNA pools statistically sample cDNA pools, such as by sequencing sufficient bases (e.g., 20-50 bases) in each of multiple cDNAs to identify each cDNA, or by sequencing short tags (e.g., 9-10 bases) which are generated at known positions relative to a defined mRNA end (see, e.g, Velculescu, 1995, Science 270:484-487).
  • sequencing sufficient bases e.g., 20-50 bases
  • sequencing short tags e.g., 9-10 bases
  • aspects ofthe biological state other than the transcriptional state such as the translational state, the activity state, or mixed aspects can be measured in order to obtain drug and pathway responses. Details of these embodiments are described infra.
  • Measurement ofthe translational state may be performed according to several methods.
  • whole genome monitoring of protein i.e., the "proteome,” Goffeau et al., supra
  • whole genome monitoring of protein i.e., the "proteome,” Goffeau et al., supra
  • binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome.
  • antibodies are present for a substantial fraction of the encoded proteins, or at least for those proteins relevant to the action of a drug of interest.
  • Methods for making monoclonal antibodies are well known (see, e.g., Harlow and Lane, 1988, Antibodies: A Laboratory Manual, Cold Spring Harbor, N.Y. which is incorporated in its entirety for all purposes).
  • monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence ofthe cell.
  • proteins from the cell are contacted to the a ⁇ ay and their binding is assayed with assays known in the art.
  • proteins can be separated by two-dimensional gel electrophoresis systems. Two-dimensional gel electrophoresis is well-known in the art and typically involves iso-electric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension.
  • the resulting electropherograms can be analyzed by numerous techniques, including mass spectrometric techniques, western blotting and immunoblot analysis using polyclonal and monoclonal antibodies, and internal and N-terminal micro-sequencing. Using these techniques, it is possible to identify a substantial fraction of all the proteins produced under given physiological conditions, including in cells (e.g., in yeast) exposed to a drug, or in cells modified by, e.g., deletion or over-expression of a specific gene.
  • activity measurements can be performed by any functional, biochemical, or physical means appropriate to the particular activity being characterized.
  • the activity involves a chemical transformation
  • the cellular protein can be contacted with the natural substrate(s), and the rate of transformation measured.
  • the activity involves association in multimeric units, for example association of an activated DNA binding complex with DNA
  • the amount of associated protein or secondary consequences of the association such as amounts of mRNA transcribed, can be measured.
  • performance ofthe function can be observed.
  • the changes in protein activities form the response data analyzed by the foregoing methods of this invention.
  • response data may be formed of mixed aspects ofthe biological state of a cell.
  • Response data can be constructed from, e.g., changes in certain mRNA abundances, changes in certain protein abundances, and changes in certain protein activities.
  • the invention provides methods for detecting markers which are differentially present in the samples of a metastatic HCC tumor or tissue samples of patients predisposed for HCC (e.g., patients at high risk for developing HCC but where the tumor is undetectable).
  • the markers can be detected in a number of biological samples.
  • the sample is preferably a biological tissue sample lysate.
  • gas phase ion spectrometry can be used. This technique includes, e.g., laser desorption/ionization mass spectrometry.
  • the sample is prepared prior to gas phase ion spectrometry, e.g., pre-fractionation, two-dimensional gel chromatography, high performance liquid chromatography, etc. to assist detection of markers.
  • Detection of markers can be achieved using methods other than gas phase ion spectrometry.
  • immunoassays can be used to detect the markers in a sample. These detection methods are described in detail below.
  • Markers present in a biological sample can be detected using gas phase ion spectrometry, and preferably, mass spectrometry.
  • gas phase ion spectrometry preferably, mass spectrometry.
  • MALDI matrix-assisted laser desorption/ionization
  • SMDI surface-enhanced laser desorption/ionization mass spectrometry
  • a sample can be pre- fractionated to provide a less complex biological sample prior to gas phase ion spectrometry analysis using one or more ofthe following methods: size exclusion chromatography, Anion Exchange Chromatography, Affinity Chromatography, Sequential Extraction, Gel Electrophoresis, high performance liquid chromatography (HPLC).
  • a marker can be modified before analysis to improve its resolution or to determine its identity.
  • the markers may be subject to proteolytic digestion before analysis. Fragments from a digestion by a suitable protease, such as trypsin, may function as a fingerprint for the markers, thereby enabling their detection indirectly.
  • a biological sample can be contacted with a substrate, such as a spectrometer probe adapted for use with a gas phase ion spectrometer.
  • a substrate can be a separate material that can be placed onto a spectrometer probe that is adapted for use with a gas phase ion spectrometer.
  • a spectrometer probe can be in any suitable shape as long as it is adapted for use with a gas phase ion spectrometer (e.g., removably insertable into a gas phase ion spectrometer).
  • the spectrometer probe substrate can be made of any suitable material, solid or porous.
  • Spectrometer probes suitable for use in embodiments ofthe invention are described in, e.g., U.S. Patent No. 5,617,060 (Hutchens and Yip) and WO 98/59360 (Hutchens and Yip).
  • the sample can be contacted with any suitable substrate for gas phase ion spectrometry.
  • an energy absorbing molecule (“EAM") or a matrix material is typically applied to markers on the substrate surface.
  • the energy absorbing molecule and the sample containing markers can be contacted in any suitable manner.
  • Complexity of a sample can be further reduced using a substrate that comprises adsorbents capable of binding one or more markers.
  • Adsorbents that bind the markers can be applied to the substrate in any suitable pattern (e.g., continuous or discontinuous), and a sample can be contacted with a substrate comprising an adsorbent in any suitable manner, e.g., bathing, soaking, dipping, spraying, washing over, or pipetting, etc. Following the contact, it is preferred that unbound materials on the substrate surface are washed out so that only the bound materials remain on the substrate surface.
  • Markers on the substrate surface can be desorbed and ionized using gas phase ion spectrometry.
  • Any suitable gas phase ion spectrometers can be used as long as it allows markers on the substrate to be resolved.
  • gas phase ion spectrometers allow quantitation of markers.
  • the gas phase ion spectrometer is a mass spectrometer, preferably a laser deso ⁇ tion time-of-flight mass spectrometer.
  • an ion mobility spectrometer can be used to detect markers.
  • a total ion cu ⁇ ent measuring device can be used to detect and characterize markers.
  • Data generated by deso ⁇ tion and detection of markers can be analyzed using any suitable means.
  • data sets are analyzed with the use of a programmable digital computer.
  • the computer program generally contains a readable medium that stores codes. Certain code can be devoted to memory that includes the location of each feature on a spectrometer probe, the identity ofthe adsorbent at that feature and the elution conditions used to wash the adsorbent.
  • the computer also contains code that receives as input, data on the strength ofthe signal at various molecular masses received from a particular addressable location on the spectrometer probe. These data can indicate the number of markers detected, including the strength of the signal generated by each marker.
  • Data analysis can include the steps of determining signal strength (e.g., height of peaks) of a marker detected and removing "outerliers" (data deviating from a predetermined statistical distribution).
  • the observed peaks can be normalized, a process whereby the height of each peak relative to some reference is calculated.
  • a reference can be background noise generated by instrument and chemicals (e.g. , energy absorbing molecule) which is set as zero in the scale.
  • the signal strength detected for each marker or other biomolecules can be displayed in the form of relative intensities in the scale desired (e.g., 100).
  • a standard e.g., a serum protein
  • a standard may be admitted with the sample so that a peak from the standard can be used as a reference to calculate relative intensities ofthe signals observed for each marker or other markers detected.
  • the computer can transform the resulting data into various formats for displaying.
  • spectrum view or retentate map a standard spectral view can be displayed, wherein the view depicts the quantity of marker reaching the detector at each particular molecular weight.
  • peak map a standard spectral view
  • mass map only the peak height and mass information are retained from the spectrum view, yielding a cleaner image and enabling markers with nearly identical molecular weights to be more easily seen.
  • gel view each mass from the peak view can be converted into a grayscale image based on the height of each peak, resulting in an appearance similar to bands on electrophoretic gels.
  • refe ⁇ ed to as "3-D overlays” several spectra can be overlaid to study subtle changes in relative peak heights.
  • refe ⁇ ed to as "difference map view” two or more spectra can be compared, conveniently highlighting unique markers and markers which are up- or down-regulated between samples. Marker profiles (spectra) from any two samples may be compared visually.
  • Spotfire Scatter Plot can be used, wherein markers that are detected are plotted as a dot in a plot, wherein one axis ofthe plot represents the apparent molecular ofthe markers detected and another axis represents the signal intensity of markers detected.
  • markers that are detected and the amount of markers present in the biological sample can be saved in a computer readable medium. These data can then be compared to a control (e.g., a profile or quantity of markers detected in control, e.g. , patients in whom metastatic HCC or tissue samples of someone predisposed for HCC is undetectable).
  • a control e.g., a profile or quantity of markers detected in control, e.g. , patients in whom metastatic HCC or tissue samples of someone predisposed for HCC is undetectable.
  • a method for predicting the potential of developing metastasis in an HCC patient or developing HCC in a patient with chronic liver disease can be embodied by code that is executed by a digital computer capable of processing data sets derived from signals from arrays after contact with patient samples.
  • the code can be executed by the digital computer to created an analytical model.
  • the code may be stored on any suitable computer readable media. Examples of computer readable media include magnetic, electronic, or optical disks, tapes, sticks, chips, etc.
  • the code may also be written in any suitable computer programming language including, visual basis, Fortran, C, C ++ , etc.
  • the digital computer may be a micro, mini, or large frame computer using any standard or specialized operating system such as a WindowsTM based operating system.
  • a standard PC personal computer could be used to perform the analytical methods according to embodiments ofthe invention.
  • An immunoassay can be used to detect and analyze markers in a sample. This method comprises: (a) providing an antibody that specifically binds to a marker; (b) contacting a sample with the antibody; and (c) detecting the presence of a complex ofthe antibody bound to the marker in the sample.
  • spleen cells from an animal immunized with a target protein are immortalized, commonly by fusion with a myeloma cell (see, Kohler and Milstein, Ewr. J. Immunol., 6:511-519, 1976). Colonies arising from single immortalized cells are screened for production of antibodies of the desired specificity and affinity for the target protein.
  • nucleic acid and amino acid sequences can be determined with knowledge of even a portion ofthe amino acid sequence of the marker. For example, degenerate probes can be made based on the N-terminal amino acid sequence ofthe marker. These probes can then be used to screen a genomic or cDNA library created from a sample from which a marker was initially detected. The positive clones can be identified, amplified, and their recombinant DNA sequences can be subcloned using techniques which are well known. See, e.g., Ausubel et al, Current Protocols for Molecular Biology, 1994 and Sambrook and Russell, supra.
  • antibodies against the marker can be prepared using any suitable methods known in the art. See, e.g., Huse et al, Science 246:1275-1281 (1989); Ward et al, Nature 341 : 544-546 (1989).
  • a marker can be detected and/or quantified using any of suitable immunological binding assays known in the art (see, e.g., U.S. Patent Nos.
  • Useful assays include, for example, an enzyme immune assay ( ⁇ IA) such as enzyme-linked immunosorbent assay ( ⁇ LISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay.
  • ⁇ IA enzyme immune assay
  • ⁇ LISA enzyme-linked immunosorbent assay
  • RIA radioimmune assay
  • Western blot assay or a slot blot assay.
  • the present invention provides methods for aiding a diagnosis of the probability of developing metastatic tumors in an HCC patient or a predispositon for developing HCC in a patient with a severe liver disease using one or more markers identified in Tables 2-7.
  • valid diagnoses can be made based on as few as one marker selected from the markers in Tables 2-7, it is prefe ⁇ ed that multiple markers are used to achieve more reliable results.
  • at least 10 cellular markers of Table 2 should be included in the set of markers used to predict an HCC patient's metastatic potential, for example, more preferably at least 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100, and most preferably all 153 markers of Table 2 should be included in the markers used.
  • markers used for determining the risk of developing HCC in a patient with a chronic liver disease should be included in the markers used for determining the risk of developing HCC in a patient with a chronic liver disease.
  • the markers identified in Tables 2- 7 can be used alone, in combination with other markers in any ofthe Tables, or with entirely different markers in aiding in the diagnosis of developing Metastatic HCC or a predisposition for developing HCC by a patient with a severe liver disease.
  • the markers in Tables 2-7 are differentially present in samples of a Metastatic HCC or tissue samples of someone predisposed for HCC relative to a non-metastatic HCC or a subject not predisposed for HCC respectively.
  • markers are expressed at an elevated level and/or are present at a higher frequency in metastatic HCC or tissue samples of someone predisposed for HCC relative to patients with non-metastatic HCC or individuals at low risk for developing HCC. Therefore, detection of one or more of these markers in a person would provide useful information regarding the probability that the person may develop Metastatic HCC or be predisposed to develop HCC.
  • embodiments ofthe invention include methods for aiding in diagnosing the probability of developing Metastatic HCC or in diagnosing the probability of a patient with a severe liver disease developing HCC, wherein the method comprises: (a) detecting at least one marker in a sample, wherein the marker is selected from the markers identified in Tables 2-7; and (b) co ⁇ elating the detection ofthe marker or markers with a diagnosis of metastatic HCC or the probability for a liver disease patient to develop HCC.
  • the co ⁇ elation may take into account the amount ofthe marker or markers in the sample compared to a control amount ofthe marker or markers (e.g., a non-metastatic HCC or a subject not predisposed for HCC).
  • the co ⁇ elation may take into account the presence or absence ofthe markers in a test sample and the frequency of detection ofthe same markers in a control. The co ⁇ elation may take into account both of such factors to facilitate determination of whether a subject has a metastatic HCC or has a sever liver disease that will likely lead to HCC.
  • Any suitable samples can be obtained from a subject to detect markers.
  • a sample is a liver tissue sample from the subject. If desired, the sample can be prepared as described above to enhance detectability ofthe markers.
  • Any suitable method can be used to detect a marker or markers in a sample.
  • gas phase ion spectrometry or an immunoassay can be used as described above. Using these methods, one or more markers can be detected.
  • a sample is tested for the presence of a plurality of markers. Detecting the presence of a plurality of markers, rather than a single marker alone, would provide more information for the diagnostician. Specifically, the detection of a plurality of markers in a sample would increase the percentage of true positive and true negative diagnoses and would decrease the percentage of false positive or false negative diagnoses.
  • the detection of the marker or markers is then co ⁇ elated with a probable diagnosis of developing metastatic HCC or a predispositon for developing HCC by a patient with a severe liver disease.
  • the detection ofthe mere presence or absence of a marker, without quantifying the amount of marker is useful and can be correlated with a probable diagnosis of developing metastatic HCC or a predispositon for developing HCC by a patient with a severe liver disease.
  • the detection of markers can involve quantifying the markers to co ⁇ elate the detection of markers with a probable diagnosis of developing metastatic HCC or a predispositon for developing HCC by a patient with severe liver disease. For example, increased levels of OPN are observed in patients with metastatic HCC. Thus, if the amount ofthe markers detected in a subject being tested is higher compared to a control amount, then the subject being tested has a higher probability of developing metastatic HCC or a predispositon for developing HCC by a patient with a severe liver disease. [0172] When the markers are quantified, it can be compared to a control.
  • a control can be, e.g., the average or median amount of marker present in comparable samples of normal subjects not predisposed to developing metastatic HCC or not predisposed to developing HCC by a patient with severe liver disease.
  • the control amount is measured under the same or substantially similar experimental conditions as in measuring the test amount. For example, if a test sample is obtained from a subject's blood serum sample and a marker is detected using a particular probe, then a control amount ofthe marker is preferably determined from a serum sample of a patient using the same probe. It is prefe ⁇ ed that the control amount of marker is determined based upon a significant number of samples from normal subjects who do not have metastatic HCC or tissue samples of someone not predisposed for HCC so that it reflects variations ofthe marker amounts in that population.
  • Data generated by mass spectrometry can then be analyzed by a computer software.
  • the software can comprise code that converts signal from the mass spectrometer into computer readable form.
  • the software also can include code that applies an algorithm to the analysis ofthe signal to determine whether the signal represents a "peak" in the signal corresponding to a marker of this invention, or other useful markers.
  • the software also can include code that executes an algorithm that compares signal from a test sample to a typical signal characteristic of "normal" and metastatic HCC or a predispositon for developing HCC by a patient with severe liver disease and determines the closeness of fit between the two signals.
  • the software also can include code indicating which the test sample is closest to, thereby providing a probable diagnosis.
  • Ostoepontin (OPN) and EpCAM have been positively co ⁇ elated to metastasis in an HCC patient and onset of HCC in a patient with a chronic liver disease, respectively. Therefore, it is one objective of this invention to identify compounds that regulate, particularly inhibit, the activity of OPN or EpCAM.
  • OPN and its alleles and polymo ⁇ hic variants are secreted phosphoproteins encoded by SEQ ID NO:l and whose amino acid sequence is disclosed in SEQ ED NO:2.
  • the activity of OPN polypeptides can be assessed using a variety of in vitro and in vivo assays to determine its functional, chemical, and physical effects, e.g. , measuring receptor binding (e.g., radioactive receptor binding), and the like. Further downstream events, such as altered cellular events including cell proliferation, differentiation, etc. may also be used as indirect indicators of modified OPN activity.
  • assays can be used to test and screen for antagonists of OPN activity.
  • Antagonists can also be genetically altered versions of OPN, e.g., a dominant negative version ofthe protein. Such antagonists of OPN activity are useful for treating metastatic HCC.
  • the OPN ofthe assay will be selected from a polypeptide having a sequence of SEQ ED NO: 2 or a conservatively modified variant or fragment thereof. Generally, the amino acid sequence identity will be at least 70%, optionally at least 85%, optionally at least 90-95%.
  • the polypeptide ofthe assays will comprise a domain of OPN, such as a receptor binding domain, an extracellular matrix binding domain, and the like.
  • Either OPN or a domain thereof can be covalently linked to a heterologous protein to create a chimeric protein used in the assays described herein.
  • Modulators of OPN activity are tested using OPN polypeptides as described above, either recombinant or naturally occurring.
  • the protein can be isolated, expressed in a cell, secreted from a cell, expressed in tissue or in an animal, either recombinant or naturally occurring.
  • liver slices, dissociated liver cells, or transformed cells can be used.
  • OPN antagonism is tested using one ofthe in vitro or in vivo assays described herein.
  • receptor-binding domains ofthe OPN protein can be used in vitro in soluble or solid state reactions to assay for receptor binding.
  • Receptor binding to OPN, a domain, or chimeric protein can be tested in solution, in a bilayer membrane, attached to a solid phase, in a lipid monolayer, or in vesicles. Binding of an antagonist can be tested using, e.g., changes in spectroscopic characteristics (e.g., fluorescence, absorbance, refractive index) hydrodynamic (e.g., shape), chromatographic, or solubility properties.
  • spectroscopic characteristics e.g., fluorescence, absorbance, refractive index
  • hydrodynamic e.g., shape
  • chromatographic chromatographic, or solubility properties
  • Samples or assays that are treated with a potential OPN inhibitor are compared to control samples without the test compound, to examine the extent of antagonism.
  • Control samples (untreated with inhibitors) are assigned a relative OPN activity value of 100.
  • Antagonism of OPN is achieved when the OPN activity value relative to the control is about 90%, optionally 50%, optionally 25-0%.
  • Changes in OPN receptor binding may be assessed by determining changes in the ability ofthe vitronectin receptor to bind OPN in the presence ofthe antagonist. Generally, the compounds to be tested are present in the range from 1 pM to 100 mM.
  • the effects ofthe test compounds upon the function ofthe polypeptides can be measured by examining any ofthe parameters described above. Any suitable physiological change that affects OPN activity can be used to assess the influence of a test compound on the polypeptides of this invention. When the functional consequences are determined using intact cells or animals, one can also measure a variety of effects such as transcriptional changes to both known and uncharacterized genetic markers (e.g., northern blots), changes in cell metabolism such as cell growth or pH changes.
  • EpCAM EpCAM-like protein
  • the biological functions of EpCAM may be monitored based on the same general principles and methodologies as described above.
  • EpCAM is known to play a role in epithelial cell homotypic adhesion, relying on both its extracellular and intracellular domains for proper functioning.
  • EpCAM's functions can be examined based on, e.g., cell aggregation, specific interactions with its known binding partners (e.g., with actin via its intracellular domain), and disruption of signal transduction it is known to mediate.
  • Various cellular events may serve as indicators of EpCAM activity and to facilitate screening test compounds for EpCAM antagonists.
  • the compounds tested as antagonists of OPN or EpCAM can be any small chemical compound, or a biological entity, such as a protein, sugar, nucleic acid or lipid.
  • a biological entity such as a protein, sugar, nucleic acid or lipid.
  • Various antibodies against the proteins are likely candidates for antagonists.
  • many monoclonal antibodies, such as 17-1 A and GA733, are known to specifically bind EpCAM and can thus be tested in appropriate assays for their ability to interfere with EpCAM's biological functions.
  • antagonists can be genetically altered versions of OPN or EpCAM, such as a so-called "dominant negative” version, a biologically inactive version that suppresses the normal function of its wild type counte ⁇ art by competing for limited binding partners.
  • test compounds will be small chemical molecules and peptides.
  • any chemical compound can be used as a potential antagonist in the assays ofthe invention, although most often compounds can be dissolved in aqueous or organic (especially DMSO-based) solutions are used.
  • the assays are designed to screen large chemical libraries by automating the assay steps and providing compounds from any convenient source to assays, which are typically run in parallel (e.g., in microtiter formats on microtiter plates in robotic assays).
  • high throughput screening methods involve providing a combinatorial chemical or peptide library containing a large number of potential therapeutic compounds (potential modulator or ligand compounds). Such "combinatorial chemical libraries” or “ligand libraries” are then screened in one or more assays, as described herein, to identify those library members (particular chemical species or subclasses) that display a desired characteristic activity. The compounds thus identified can serve as conventional "lead compounds” or can themselves be used as potential or actual therapeutics.
  • a combinatorial chemical library is a collection of diverse chemical compounds generated by either chemical synthesis or biological synthesis, by combining a number of chemical "building blocks" such as reagents.
  • a linear combinatorial chemical library such as a polypeptide library is formed by combining a set of chemical building blocks (amino acids) in every possible way for a given compound length (i.e., the number of amino acids in a polypeptide compound). Millions of chemical compounds can be synthesized through such combinatorial mixing of chemical building blocks.
  • Preparation and screening of combinatorial chemical libraries is well known to those of skill in the art. Such combinatorial chemical libraries include, but are not limited to, peptide libraries (see, e.g., U.S.
  • chemistries for generating chemical diversity libraries can also be used. Such chemistries include, but are not limited to: peptoids (e.g., PCT Publication No. WO 91/19735), encoded peptides (e.g., PCT Publication WO 93/20242), random bio-oligomers (e.g., PCT Publication No. WO 92/00091), benzodiazepines (e.g., U.S. Pat. No.
  • Devices for the preparation of combinatorial libraries are commercially available (see, e.g., 357 MPS, 390 MPS, Advanced Chem Tech, Louisville KY, Symphony, Rainin, Woburn, MA, 433A Applied Biosystems, Foster City, CA, 9050 Plus, Millipore, Bedford, MA).
  • numerous combinatorial libraries are themselves commercially available (see, e.g., ComGenex, Princeton, N.J., Tripos, Inc., St. Louis, MO, 3D Pharmaceuticals, Exton, PA, Martek Biosciences, Columbia, MD, etc.).
  • the invention provide soluble assays using molecules such as a domain such as a receptor binding domain, an extracellular matrix binding domain, etc.; a domain that is covalently linked to a heterologous protein to create a chimeric molecule; OPN or EpCAM; or a cell or tissue expressing OPN or EpCAM, either naturally occurring or recombinant.
  • the invention provides solid phase based in vitro assays in a high throughput format, where the domain, chimeric molecule, OPN or EpCAM, or cell or tissue expressing OPN or EpCAM is attached to a solid phase substrate.
  • each well of a microtiter plate can be used to run a separate assay against a selected potential modulator, or, if concentration or incubation time effects are to be observed, every 5-10 wells can test a single modulator.
  • a single standard microtiter plate can assay about 100 (e.g., 96) modulators. If 1536 well plates are used, then a single plate can easily assay from about 100- about 1500 different compounds. It is possible to assay several different plates per day; assay screens for up to about 6,000-20,000 different compounds is possible using the integrated systems ofthe invention.
  • the molecule of interest can be bound to the solid state component, directly or indirectly, via covalent or non covalent linkage e.g., via a tag.
  • the tag can be any of a variety of components.
  • a molecule which binds the tag (a tag binder) is fixed to a solid support, and the tagged molecule of interest (e.g., the signal transduction molecule of interest) is attached to the solid support by interaction ofthe tag and the tag binder.
  • a number of tags and tag binders can be used, based upon known molecular interactions well described in the literature.
  • a tag has a natural binder, for example, biotin, protein A, or protein G
  • tag binders avidin, streptavidin, neutravidin, the Fc region of an immunoglobulin, etc.
  • Antibodies to molecules with natural binders such as biotin are also widely available and appropriate tag binders; see, SIGMA Immunochemicals 1998 catalogue SIGMA, St. Louis MO).
  • any haptenic or antigenic compound can be used in combination with an appropriate antibody to form a tag/tag binder pair. Thousands of specific antibodies are commercially available and many additional antibodies are described in the literature.
  • the tag is a first antibody and the tag binder is a second antibody which recognizes the first antibody.
  • receptor-ligand interactions are also appropriate as tag and tag-binder pairs.
  • agonists and antagonists of cell membrane receptors e.g., cell receptor-ligand interactions such as transferrin, c-kit, viral receptor ligands, cytokine receptors, chemokine receptors, interleukin receptors, immunoglobulin receptors and antibodies, the cadherein family, the integrin family, the selectin family, and the like; see, e.g., Pigott & Power, Ebe Adhesion Molecule Facts Book I (1993).
  • toxins and venoms can all interact with various cell receptors.
  • hormones e.g., opiates, steroids, etc.
  • intracellular receptors e.g. which mediate the effects of various small ligands, including steroids, thyroid hormone, retinoids and vitamin D; peptides
  • lectins e.g. which mediate the effects of various small ligands, including steroids, thyroid hormone, retinoids and vitamin D; peptides
  • drugs lectins
  • sugars e.g., nucleic acids (linear or cyclic polymer configurations), oligosaccharides, proteins, phospholipids, and antibodies
  • nucleic acids linear or cyclic polymer configurations
  • oligosaccharides oligosaccharides
  • proteins e.g.,
  • Synthetic polymers such as polyurethanes, polyesters, polycarbonates, polyureas, polyamides, polyethyleneimines, polyarylene sulfides, polysiloxanes, polyimides, and polyacetates can also form an appropriate tag or tag binder. Many other tag/tag binder pairs are also useful in assay systems described herein, as would be apparent to one of skill upon review of this disclosure.
  • Common linkers such as peptides, polyethers, and the like can also serve as tags, and include polypeptide sequences, such as poly gly sequences of between about 5 and 200 amino acids.
  • polypeptide sequences such as poly gly sequences of between about 5 and 200 amino acids.
  • Such flexible linkers are known to persons of skill in the art.
  • poly(ethelyne glycol) linkers are available from Shearwater Polymers, Inc. Huntsville, Alabama. These linkers optionally have amide linkages, sulfhydryl linkages, or heterofunctional linkages.
  • Tag binders are fixed to solid substrates using any of a variety of methods cu ⁇ ently available.
  • Solid substrates are commonly derivatized or functionalized by exposing all or a portion ofthe substrate to a chemical reagent which fixes a chemical group to the surface which is reactive with a portion ofthe tag binder.
  • groups which are suitable for attachment to a longer chain portion would include amines, hydroxyl, thiol, and carboxyl groups.
  • Aminoalkylsilanes and hydroxyalkylsilanes can be used to functionalize a variety of surfaces, such as glass surfaces.
  • the construction of such solid phase biopolymer a ⁇ ays is well described in the literature. See, e.g., Me ⁇ ifield, J. Am. Chem. Soc.
  • Non-chemical approaches for fixing tag binders to substrates include other common methods, such as heat, cross-linking by UV radiation, and the like.
  • Yet another approach to screen for compounds that modulate OPN or EpCAM activity involves computer assisted drug design, in which a computer system is used to generate a three-dimensional structure of OPN or EpCAM based on the structural information encoded by the amino acid sequence.
  • the input amino acid sequence interacts directly and actively with a pre-established algorithm in a computer program to yield secondary, tertiary, and quaternary structural models ofthe protein.
  • the models ofthe protein structure are then examined to identify regions ofthe structure that have the ability to bind, e.g., ligands. These regions are then used to identify ligands that bind to the protein.
  • the three-dimensional structural model ofthe protein is generated by entering protein amino acid sequences of at least 10 amino acid residues or co ⁇ esponding nucleic acid sequences encoding an OPN or EpCAM polypeptide into the computer system.
  • the amino acid sequence of an OPN polypeptide or the nucleic acid encoding the polypeptide is selected from the group consisting of SEQ ID NOS:l or 2, and conservatively modified versions thereof.
  • the amino acid sequence represents the primary sequence or subsequence ofthe protein, which encodes the structural information ofthe protein.
  • At least 10 residues of the amino acid sequence are entered into the computer system from computer keyboards, computer readable substrates that include, but are not limited to, electronic storage media (e.g. , magnetic diskettes, tapes, cartridges, and chips), optical media (e.g., CD ROM), information distributed by internet sites, and by RAM.
  • electronic storage media e.g. , magnetic diskettes, tapes, cartridges, and chips
  • optical media e.g., CD ROM
  • the three-dimensional structural model ofthe protein is then generated by the interaction of the amino acid sequence and the computer system, using software known to those of skill in the art.
  • the amino acid sequence represents a primary structure that encodes the information necessary to form the secondary, tertiary and quaternary structure ofthe protein of interest.
  • the software looks at certain parameters encoded by the primary sequence to generate the structural model. These parameters are referred to as "energy terms,” and primarily include electrostatic potentials, hydrophobic potentials, solvent accessible surfaces, and hydrogen bonding. Secondary energy terms include van der Waals potentials. Biological molecules form the structures that minimize the energy terms in a cumulative fashion. The computer program is therefore using these terms encoded by the primary structure or amino acid sequence to create the secondary structural model.
  • the tertiary structure ofthe protein encoded by the secondary structure is then formed on the basis ofthe energy terms ofthe secondary structure.
  • the user at this point can enter additional variables such as whether the protein is membrane bound or soluble, its location in the body, and its cellular location, e.g. , cytoplasmic, surface, or nuclear. These variables along with the energy terms ofthe secondary structure are used to form the model of the tertiary structure.
  • the computer program matches hydrophobic faces of secondary structure with like, and hydrophilic faces of secondary structure with like.
  • Three-dimensional structures for potential ligands are generated by entering amino acid or nucleotide sequences or chemical formulas of compounds, as described above. The three-dimensional structure ofthe potential ligand is then compared to that of the OPN or EpCAM protein to identify ligands that bind to OPN or EpCAM. Binding affinity between the protein and ligands is determined using energy terms to determine which ligands have an enhanced probability of binding to the protein.
  • Computer systems are also used to screen for mutations, polymo ⁇ hic variants, alleles and interspecies homologs of OPN genes or EpCAM genes. Such mutations can be associated with disease states or genetic traits.
  • GENECHIP® and related technology can also be used to screen for mutations, polymo ⁇ hic variants, alleles, and interspecies homologs. Once the variants are identified, diagnostic assays can be used to identify patients having such mutated genes. Identification ofthe mutated OPN genes, for example, involves receiving input of a first amino acid or nucleic acid sequence encoding OPN, selected from the group consisting of SEQ ID NOS:l and 2, and conservatively modified versions thereof. The sequence is entered into the computer system as described above. The first nucleic acid or amino acid sequence is then compared to a second nucleic acid or amino acid sequence that has substantial identity to the first sequence. The second sequence is entered into the computer system in the manner described above.
  • nucleotide or amino acid differences between the sequences are identified.
  • sequences can represent allelic differences in OPN genes, and mutations associated with disease states and genetic traits.
  • the same general strategy is also applicable for detecting EpCAM variants and mutants.
  • a protein of interest and its homologs are a useful tool for identifying its antagonists.
  • OPN-specific reagents that specifically hybridize to OPN nucleic acid such as OPN probes and primers
  • OPN specific reagents that specifically bind to the OPN protein e.g., OPN antibodies are used to examine liver cell expression, signal transduction regulation and diagnose metastatic HCC.
  • OPN antibodies are used to examine liver cell expression, signal transduction regulation and diagnose metastatic HCC.
  • the same general methods are applicable to EpCAM as well.
  • Nucleic acid assays for the presence and the quantity of OPN or EpCAM polynucleotides in a sample include numerous techniques well known to those skilled in the art, such as Southern blot analysis, northern blot analysis, dot blots, RNase protection, SI analysis, amplification techniques such as PCR (including RT-PCR) and LCR, and in situ hybridization.
  • in situ hybridization for example, the target nucleic acid, e.g., nucleic acid encoding OPN, is liberated from its cellular su ⁇ oundings in such as to be available for hybridization within the cell while preserving the cellular mo ⁇ hology for subsequent inte ⁇ retation and analysis (see Example 1).
  • OPN or EpCAM protein can be detected with the various immunoassay techniques described above.
  • the test sample is typically compared to both a positive control (e.g., a sample containing recombinant OPN or EpCAM) and a negative control.
  • kits for screening for modulators of OPN or EpCAM can be prepared from readily available materials and reagents.
  • kits can comprise any one or more ofthe following materials: OPN (or EpCAM), reaction tubes, and instructions for testing OPN (or EpCAM) activity.
  • the kit contains biologically active OPN (or EpCAM).
  • kits and components can be prepared according to the present invention, depending upon the intended user ofthe kit and the particular needs ofthe user.
  • Another means of inhibiting OPN activity and thereby inhibiting HCC metastasis in an HCC patient is to inhibit OPN expression.
  • reduced risk of developing HCC in a patient of a chronic liver disease may be achieved by inhibiting EpCAM expression.
  • a variety of methods well known to those skilled in the art are available for specifically suppressing the expression of a particular gene.
  • Antisense polynucleotides [0207] Antisense technology has been the most commonly described approach in protocols to achieve gene-specific inactivation and are useful tools in research and diagnostics. For instance, antisense oligonucleotides capable of inhibiting gene expression with high level of specificity are often used by those of ordinary skill in biological sciences to elucidate the function of particular genes. [0208] The specificity and sensitivity of antisense polynucleotides also make them suitable for therapeutic uses. A large number of U.S. patents and scientific publications relate to the use of antisense oligonucleotides as therapeutic agents in the treatment of diseases in animals and humans. See, e.g., U.S. Patent Nos.
  • An antisense oligonucleotide contains a sequence complementary to the coding strand of a gene targeted for inactivation (e.g., SEQ ID NO: 1 or SEQ ID NO:5) and may be of varying lengths, e.g., from less than 10 nucleotides to more than 100 nucleotides, can be safely and effectively administered to a subject, e.g., a human.
  • An antisense polynucleotide may be an oligomer or a polymer of ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) or mimetics thereof. It may be composed of naturally- occu ⁇ ing nucleobases, sugars and covalent internucleoside (backbone) linkages as well as oligonucleotides having non-naturally-occurring portions that function similarly. Such modified or substituted antisense oligonucleotides are often preferred over native forms because of desirable properties such as, e.g., enhanced cellular uptake, enhanced affinity for nucleic acid target, and increased stability in the presence of nucleases.
  • Antisense oligonucleotides suitable for the present invention may also include oligonucleotides containing modified backbones or non-natural internucleoside linkages.
  • Prefe ⁇ ed modified oligonucleotide backbones include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotri-esters, methyl and other alkyl phosphonates including 3 '-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3 '-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thiono-alkylphosphonates, thionoalkylphosphotriesters, and borano-phosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of
  • antisense oligonucleotides suitable for the present invention may correspond to either the coding region or the non-coding region of a target nucleic acid, e.g., OPN or EpCAM.
  • Ribozymes are RNA molecules having an enzymatic activity that is capable of cleaving or splicing other separate RNA molecules in a nucleotide sequence specific manner.
  • a ribozyme useful for practicing the present invention is a catalytic or enzymatic RNA molecule with complementarity in a substrate binding region to a specific RNA target, e.g. , OPN or EpCAM mRNA, and also has enzymatic activity that is active to cleave and/or splice RNA in that target, thereby inhibiting the expression ofthe target gene.
  • siRNA molecules are small double-stranded RNA molecules that elicit a process known as RNA interference, a form of sequence-specific gene inactivation.
  • RNA interference hypothesizes an ATP- dependent cleavage of mRNA molecules activated by a short double-stranded RNA, which is formed between the mRNA and the antisense strand of siRNA. Zamore et al, Cell 101:25- 33, 2000. RNA interference has been shown in mammalian cell lines, oocytes, early embryos, and some cell types.
  • siRNA coding sequences can be designed based on the sequence of a target gene (e.g. , OPN or EpCAM) and inserted into various suitable vectors, such as a plasmid or a viral vector, with properly placed transcription initiation and termination elements. When used in an intended recipient of eukaryotic origin, eukaryotic transcription control elements should be used.
  • the vectors containing siRNA coding sequences can then be delivered to a desired target in accordance with the general methodologies for gene transfer known to those of skill in the art.
  • RNA interference thus provides an alternative means to specifically inhibit the expression of a gene based on its sequence, by causing the rapid degradation ofthe mRNA of the gene, e.g., OPN or EpCAM.
  • D. Detection of Reduced Target Gene Expression Following the administration of a therapeutic compound containing an agent capable of inhibiting the expression of a target gene, e.g., OPN or EpCAM, the effectiveness ofthe therapeutic compound can be assessed by comparing the in vivo level ofthe target gene before and after the administration.
  • a therapeutic compound containing an agent capable of inhibiting the expression of a target gene e.g., OPN or EpCAM
  • the effectiveness ofthe therapeutic compound can be assessed by comparing the in vivo level ofthe target gene before and after the administration.
  • the general methods for administering a pharmaceutical compound are described in detail in a later section.
  • the inhibition of gene expression is achieved at transcriptional level, i.e., by reduction ofthe amount of mRNA encoding a target gene
  • the diminished expression ofthe target gene may be confirmed using various detection techniques such as Northern blot assays, dot blot, RT-PCR and the like by comparing the mRNA level ofthe target gene (e.g., OPN or EpCAM) before and after the administration of a therapeutic compound.
  • the target gene e.g., OPN or EpCAM
  • the general methodologies for performing such analysis are well known to those of ordinary skill in the art and described in various literature (see, e.g., Sambrook and Russell, supra and Ausubel et al., supra).
  • the inhibition of gene expression is achieved at translational level, i.e., by reduction of the amount of protein encoded by a target gene
  • the diminished expression ofthe target gene may be confirmed by comparing the protein level ofthe target gene (e.g., OPN or EpCAM) before and after the administration of a therapeutic compound using various means of measuring protein levels in tissue samples are well known to the ordinarily skilled artisans.
  • various immunoassays are routinely used to detect the presence and quantity of a protein of interest, e.g., OPN or EpCAM. A general overview ofthe applicable technology can be found in Harlow and Lane, Antibodies, A Laboratory Manual, 1988.
  • Appropriate antibodies for target proteins e.g., OPN and EpCAM
  • target proteins e.g., OPN and EpCAM
  • the general methods for preparing antibodies specific for a target protein are well known in the art and described in an earlier section. Further, some antibodies with desirable specificity may already be available for immunoassays (e.g., various mAb for EpCAM).
  • the level the target protein in a patient can be measured by a variety of immunoassay methods with qualitative and quantitative results available to the clinician.
  • Various samples from the patient such as blood or liver tissue, can be used in the immunoassays to detected the in vivo target protein level according to the general methods described in an earlier section.
  • immunological and immunoassay procedures in general see, e.g., Stites, supra; U.S. Patent Nos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168.
  • Agents that inhibit the activity of a target protein can be administered directly to the human patient for modulation ofthe target protein activity in vivo.
  • Administration is by any ofthe routes normally used for introducing an antagonist or inhibitor compound into ultimate contact with the tissue to be treated, optionally using the tongue or mouth.
  • the antagonists or inhibitors are administered in any suitable manner, optionally with pharmaceutically acceptable earners. Suitable methods of administering such antagonists or inhibitors are available and well known to those of skill in the art, and, although more than one route can be used to administer a particular composition, a particular route can often provide a more immediate and more effective reaction than another route.
  • compositions are determined in part by the particular composition being administered, as well as by the particular method used to administer the composition. Accordingly, there is a wide variety of suitable formulations of pharmaceutical th compositions ofthe present invention (see, e.g., Remington 's Pharmaceutical Sciences, 17 ed., 1985).
  • the antagonists or inhibitors can be made into aerosol formulations (i.e., they can be "nebulized") to be administered via inhalation. Aerosol formulations can be placed into pressurized acceptable propellants, such as dichlorodifluoromethane, propane, nitrogen, and the like.
  • propellants such as dichlorodifluoromethane, propane, nitrogen, and the like.
  • Formulations suitable for administration include aqueous and non-aqueous solutions, isotonic sterile solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that render the formulation isotonic, and aqueous and non-aqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives.
  • compositions can be administered, for example, by orally, topically, intravenously, intraperitoneally, intravesically or intrathecally.
  • the compositions are administered orally or nasally.
  • the formulations of compounds can be presented in unit-dose or multi-dose sealed containers, such as ampules and vials. Solutions and suspensions can be prepared from sterile powders, granules, and tablets ofthe kind previously described.
  • the modulators can also be administered as part a of prepared food or drug.
  • the dose administered to a patient should be sufficient to effect a beneficial response in the subject over time.
  • the dose will be determined by the efficacy ofthe particular signal modulators employed and the condition of the subject, as well as the body weight or surface area ofthe area to be treated.
  • the size of the dose also will be determined by the existence, nature, and extent of any adverse side- effects that accompany the administration of a particular compound or vector in a particular subject.
  • an antagonist or inhibitor to be administered in a physician may evaluate circulating plasma levels of the agent, its toxicities, and the production of antibodies against the agent.
  • the dose equivalent of an antagonist or inhibitor is from about 1 ng/kg to 10 mg/kg for a typical subject.
  • antagonists or inhibitors of the present invention can be administered at a rate determined by the LD-50 of the antagonist, and the side-effects ofthe inhibitor at various concentrations, as applied to the mass and overall health ofthe subject. Administration can be accomplished via single or divided doses.
  • Example 1 Predicting a predisposition for Hepatocellular Carcinoma metastasis
  • HCC samples were obtained with informed consent from patients who underwent curative resection in Liver Cancer Institute, Zhongshan Hospital of Fudan University in China.
  • a total of 107 paired primary HCC, metastatic HCC, and adjacent non- tumor normal liver tissue samples were obtained from 40 patients who were pathologically diagnosed as HCC and underwent hepatectomy at the Liver Cancer Institute, Zhongshan Hospital of Fudan University (formerly Shanghai Medical University) in China.
  • Prior to surgery each patient was examined by computer tomography of abdomen and chest X-ray, and some patients also were examined by isotope scanning of bone if necessary.
  • 81 were from 27 patients who had primary HCC, co ⁇ esponding adjacent non-tumor liver tissue and metastatic HCC [15 with intra-hepatic spreads (group P) and 12 with tumor thrombus in branch of portal vein (group PT)], and 26 were from 13 patients who had only a single primary HCC and co ⁇ esponding non-tumor liver tissue (without detectable metastasis at the time of surgery). Tumors and non-tumor tissues were grossly dissected, snap-frozen in liquid nitrogen immediately after removal, and stored at -70°C until use.
  • tumor tissue samples and their metastases consisted mostly of carcinoma cells and that non-tumor adjacent liver samples did not exhibit any tumor cell invasion.
  • 39 were male, and one was female.
  • Patients' age ranged from 36 years to 74 years, with a median age of 50 years.
  • the size ofthe primary HCC ranged from 1.3 cm to 17.5 cm in diameter with a median diameter of 7.2 cm, of which 65% (26/40) were > 5 cm in diameter and remaining were ⁇ 5 cm in diameter.
  • Thirty-two cases (80%) had co-existing liver ci ⁇ hosis.
  • Serologically, all ofthe 40 patients with an exception of one were HBV-positive, but no one was HCV-positive.
  • AFP alpha-fetoprotein
  • the cDNA microa ⁇ ays were fabricated at the Advanced Technology Center, NCI. Each array contains 9180 cDNA clones with 7102 "named" genes, 1179 EST clones, and 122 Incyte clones. Preparation of fluorescent cDNA targets by a direct labeling approach and the cDNA microa ⁇ ay hybridization were essentially as described by Wu et al., Oncogene 20:3674-3682, 2001.
  • the fluorescent targets were prepared as following: 100 ⁇ g of total RNA from non- cancerous liver tissue were labeled with Cy3-conjugated deoxynucleotides or 200 ⁇ g of total RNA from primary HCC or metastasis were labeled with Cy5 -conjugated deoxynucleotides (Amersham) by the oligo dT-primed polymerization using Superscript II reverse transcriptase (Life Technologies). The targets were then mixed together and added to the microa ⁇ ays, and then incubated overnight (12-16 hours) at 42°C.
  • each microa ⁇ ay was pre-hybridized at 42°C for at least one hour in pre-hybridization buffer containing 5 ⁇ SSC, 0.1% SDS and 1% BSA.
  • the slides were washed at room temperature in each with 2x SSC, 0.1% SDS and lx SSC and0.2x SSC for 2 min, respectively, and washed in 0.05x SSC for 1 min. Most of samples, when indicated, were done in duplication.
  • the Cy3 and Cy5 fluorescent intensities for each clone were determined by the Axon GenePix 4000 scanner, and were analyzed by the GenePix Pro 3.0 software to subtract the background signals. The expression data were then filtered based on their channel intensities, spots size and flag, and the Cy5/Cy3 ratios were calculated and normalized by median-centering the log-ratio of all genes in each a ⁇ ay.
  • TREEVIEW software using median centered co ⁇ elation and complete linkage (Eisen et al., supra).
  • the Class Comparison Tool based on univariate F-tests was used to find genes differentially expressed between predefined clinical groups at a significance level of E O.001 or 0.002.
  • the permutation distribution ofthe F-statistic based on 2000 random permutations was also used to confirm statistical significance. In comparing primary to metastatic tumors ofthe same patient, a paired value t-statistic was used in the same manner.
  • the multi-variate Compound Covariate Predictor (CCP) Tool with a "leave-one-out" cross- validation test using 2000 random permutations at a significant level of EO.001 was used to classify predefined clinical groups based on their gene expression profiles.
  • CCP Compound Covariate Predictor
  • the statistical significance ofthe cross-validated misclassification rate is determined by repeating the entire cross-validation procedure to data with the class membership labels randomly permuted 2000 times.
  • the CCP is based on a weighted linear combination of gene expression variables that are univariately significant in the training set with the weights being the co ⁇ esponding t-statistics as described in Radmacher et al., supra.
  • the cross-validation was performed with one pair at a time omitted and the classification based on the paired differences in expression for each gene. Averaged gene expression data from duplicated samples were included for the analysis.
  • QuantumRNATM 18S was used as an internal standard. Densitometry was used to quantify the amount of OPN, which was normalized by the 18S product. Western blot analysis was done essential as described by Wu et al., supra.
  • protein lysates from CCL13, SK-Hep-1 and Hep3B cells were prepared in REPA buffer (50 mM Tris-HCI, pH 7.4/150 mM NaCl/1% Triton X-100/1% deoxycholate/1.0% SDS/1% aprotinin), separated on 10% SDS-PAGE, transfe ⁇ ed to an Immobilin-P membrane (Millipore, Bedford, MA), probed with a rat monoclonal anti-OPN antibody (Chemicon International), and visualized by the ECL-based assay (Amersham).
  • REPA buffer 50 mM Tris-HCI, pH 7.4/150 mM NaCl/1% Triton X-100/1% deoxycholate/1.0% SDS/1% aprotinin
  • BioCoat Matrigel Invasion Chamber (BD Biosciences) according to the manufacture's instruction. These cells were obtained from American Type Culture Collection. Cells were routinely maintained at 37°C in a humidified atmosphere of 5% CO 2 in EMEM (GEBCOL) medium supplemented with 10% fetal bovine serum, lx nonessential amino acids, lx sodium pyruvate, 2 mM glutamine and penicillin/streptomycin.
  • cells were plated in the up chamber in serum-free EMEM, and incubated in the absence or presence of either recombinant murine OPN (2 ⁇ g/ml) (R&D Systems) or a well-documented neutralizing antibody against OPN (3 ⁇ g/ml) (R&D Systems) for 20 hours.
  • the EMEM medium containing 5% FBS was added to the bottom chamber, serving as chemoattractants.
  • the number of cells invading through the MatrigelTM membrane was calculated before and after adding OPN or antibody of OPN for each cell line.
  • Paraffin-embedded tissue blocks were prepared and were subjected to serial sections with a thickness of 5 ⁇ m mounted on electrically charged glass slides. Slides were subjected to hematoxylin and eosin (H&E) staining. Two pathologists read these slides independently for the histological diagnosis. For immunohistochemistry analysis, slides were deparafinized and processed for immunostaining as described by Forgues et al., J Biol. Chem. 276:22797- 22803, 2001. Briefly, slides were incubated in microwave oven for 15 min in IX citrate buffer for antigen retrieval and then quenched with 3% hydrogen peroxide to block the endogenous peroxidase activity for 10 min.
  • H&E hematoxylin and eosin
  • CCP compound covariate predictor
  • ** PN single primary HCC
  • PT primary HCC with tumor thrombi in portal vein
  • PT-M tumor thrombi from paired PT
  • P primary HCC with intra-hepatic metastasis
  • P-M intra- 35 hepatic metastasis from paired P
  • P/PT both P and PT
  • P-M/PT-M both P-M and PT-M
  • tumor sizes tumor sizes, diameter in length.
  • a gene expression-based model from supervised machine learning algorithm can predict HCC patients with 40 metastatic potential.
  • Fig 2 shows the calculated "weighted voting" L value with metastatic samples yielding negative values and non-metastatic samples yielding positive values. All of the test samples with the exception of one "P" sample (S29) were classified to the metastatic group (Fig 2a).
  • Patient follow-up data indicated that one PN patient (S56) was found to develop lung metastases 8 months following surgery, the second PN patient (S57) was cancer-free 9 months after surgery, and the third patient (S55) did not respond to the follow-up request.
  • We also analyzed these samples by multidimensional scaling based on the 153-gene set obtained from the PN/PT comparison.
  • the list of 153 genes from the prediction model was based on a stringent criterion (P value at 0.001) to minimize the number of false-positive genes in the classifier that is needed for an accurate classification.
  • stringent criterion may exclude many genes that could be significant for metastasis progression.
  • OPN osteopontin
  • IHC Immunohistochemical analysis
  • HCCLM3 cell line is a clone derived from MHCC97 cells with a high degree of pulmonary metastasis following subcutaneous (s.c.) injection (Li et al., J. Cancer Res. Clin. Oncology, 2002). Consistent with our recent data, a 100% of tumorigenicity was achieved in 1 week after s.c. injection. There was no significant difference in the size of primary tumors between control and anti-OPN groups (Figure 5 E), which is consistent with our in vitro results that anti-OPN does not affect HCC cell growth.
  • Table 4 30 Significant genes for predicting metastasis and their values necessary for computing multifactorial L value in the prediction model.
  • Example 2 Predicting a predisposition for Hepatocellular Carcinoma 1. Material and methods a) Patients and tissue samples
  • Surgical specimens were collected with prior informed consent and with the protocols and the approval by the Institution Review Board of University of Minnesota.
  • Liver samples were obtained from 59 end-stage chronic liver disease patients who received liver transplantation between 1995-2001.
  • Disease-free liver samples from 8 liver donors were used as control.
  • the collection of these samples was mainly managed through the Liver Tissue Procurement and Distribution System (LTP ADS) at University of Minnesota, USA.
  • LTP ADS Liver Tissue Procurement and Distribution System
  • Tumor and matched non-tumor liver samples from 64 patients were obtained through either the LTP ADS program or Liver Cancer Institute at Fudan University, China. Frozen samples once received was stored immediately at -80°C in a tissue repository database.
  • a hierarchical clustering analysis was preformed using a relative gene expression ratio (Cy5/Cy3) to examine the relatedness among expression patterns of several gene lists and those in two risk groups.
  • Cluster analysis was performed using Cluster software and visualized using Tree View software (Eisen et al., supra).
  • Hierarchical clustering was performed following median centering normalization.
  • t was the t-statistic for the two group comparison of classes with respect to geney
  • x u was the log-ratio measured in specimen i for genej and the sum is over all differentially expressed genes.
  • EpCAM expression and its in vitro inhibition [0250] The expression of EpCAM was assessed by semi-quantitative PCR. Total RNA was reversed-transcribed to produce single-stranded cDNA using random primers (Promega) with Superscript II reverse transcriptase (Invitrogen) according to manufacturer's protocol. PCR amplification was performed with QuantumRNA 18S Internal Standards (Ambion) by using HotStarTaq DNA polymerase (Qiagen) according to manufacturer's protocol.
  • primer sequences are as follow: forward, 5 '-TGC CGC AGC TCA GGA AGA ATG TGT-3 ' (SEQ ID NO:6); reverse, 5 '-CAT CAT TCT GAG TTT TTT GAG AAG-3' (SEQ ID NO:7).
  • siRNA was used to inhibit EpCAM expression.
  • siRNA were synthesized by Qiagen.
  • the sense and antisence strands of EpCAM are: sense, 5'-GUU UGC GGA CUG CAC UUC AdTdT-3' (SEQ ID NO:8); antisense, 5'-UGA AGU GCA GUC CGC AAA CdTdT-3' (SEQ ID NO:9).
  • Non-silencing RNA was purchased from Qiagen and used as control siRNA.
  • control siRNA The sequences of control siRNA were: sense, 5'-UUC UCC GAA CGU GUC ACG UdTdT-3' (SEQ ID NO: 10); antisense, 5'-ACG UGA CAC GUU CGG AGA AdTdT-3' (SEQ ID NO: 11).
  • Transfection of siRNAs was carried out using TransIT-TKO transfection reagent (Mirus) according to the manufacturer's protocol and 200 nM siRNA duplex per experiment. Cell growth was determined by using Cell Counting Kit-8 (Dojindo Molecular Tech.) as described by the manufacturer. The experiments were performed in triplicate.
  • the 273-gene set (Table 5) was a common signature for tumors, we applied this set to two independent HCC gene expression profiles using the 3NN and SVM predictors.
  • One set included 24 HCC samples derived from a comparison with the same normal liver control used above and the other set including 50 HCC samples that were compared to its matched non-cancerous liver tissues (Ye et al., supra).
  • the 273- gene signature provided an increased fitness by SVM in their classification with an overall accuracy of 92% for the 24 HCC samples and 94% for the 50 HCC samples (data not shown), which was improved in overall performance as compared to the 556-gene set. Consistently, the non-overlapping 283-gene set did not provide any satisfactory performance.
  • the 283 genes may belong to the signatures separating the etiologies. Moreover, the 383 overlapping genes selected from a comparison of HBV/HCV/HHC/WD and ALD/PBC/AIH/HCC did not yield a meaningful classification ofthe two independent HCC sets with an overall predictive rate below 50% (a random event).
  • the 273 genes were examined in multiple liver samples taken from two HBV patients and from different parts of the liver that were spread at least in a 5 cm diameter region. The profiles of these 273 genes in different parts ofthe livers from these two patients were almost identical (data not shown).
  • top 25 genes with the lowest parametric p-values (pO.OOOOOl) were selected from the 273-gene set. This set gave rise to a comparable result as the 273-gene set (data not shown). Taken together, these results indicate that the 273-gene set contains most ofthe HCC-associated genes relevant to HCC development and that these genes are widely spread in the parenchyma ofthe affected livers rather than are retained locally.
  • the gene parameters in this signature were applied using SVM to 98 HCCs, 53 lung cancers, 89 gastric adenocarcinoma, 37 soft tissue tumors, 39 breast tumors and 23 difuse large B-cell lymphoma (DLBCL) from several publicly available microarray datasets (Alizadeh et al., supra; Perou et al., supra; Garber et al., Proc. Natl Acad. Sci. U.S.A. 98:13784-13789, 2001).
  • DLBCL difuse large B-cell lymphoma
  • the 273-gene set consistently performed well with additional 98 HCC samples (80% of the samples fit the signature), 97% of breast cancers (39 cases) and 78% of DLBCL cases shared similar signatures. In contrast, most ofthe tumor samples from lung, soft tissues, and stomach showed a very poor fit to this signature (between 6 and 30%o ofthe cases) (data not shown). As a control, the 283-gene set (non-HCC-related genes) did not provide a satisfying prediction to these samples. Thus, the HCC-associated genes in the classifier appear to be commonly disregulated in breast cancer and DLBCL, but not in lung adenocarcinoma, soft tissue tumors, and gastric adenocarcinoma.
  • HCC genes responsible for the genesis of HCC may be present in the 273 gene set.
  • the gene whose expression is significantly elevated in the high-risk group but not in the low-risk group may act as an oncogene to promote cell growth.
  • TACSTD1, Hs.692 tumor-associated calcium signal transducer 1
  • EpCAM Elevated expressions of EpCAM in the high-risk CLD samples were verified by the quantitative RT-PCR analysis (Fig 6b).
  • Fig 6f inhibition of EpCAM expression by two different siRNA oligos specific to EpCAM resulted in a significant growth inhibition of Hep3B cells
  • a control siRNA oligo has no such effect (Fig 6e and data not shown).
  • Table 5 273 significant genes for predicting the potential for developing HCC in a patient with a chronic liver disease and their values necessary for computing multifactorial L value in the prediction model.
  • Table 6 25 significant genes for identifying patients likely to develop HCC by the compound covariate predictor analysis and their values necessary for computing multifactorial L value in the prediction model.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Genetics & Genomics (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Organic Chemistry (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Urology & Nephrology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biochemistry (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Oncology (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Microbiology (AREA)
  • Bioethics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Software Systems (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Gastroenterology & Hepatology (AREA)

Abstract

The present invention relates to methods for diagnosing the metastatic potential of hepatocellular carcinoma (HCC) in HCC patients and methods for diagnosing the potential of developing HCC in patients with chronic liver diseases. A computer readable medium, a digital computer, and a system useful for such diagnosis are also provided. Further disclosed are methods for identifying potential therapeutic targets for treating metastasis in HCC patients and methods for preventing HCC in patients with chronic liver diseases. In addition, the invention provides methods for inhibiting metastasis in HCC patients by suppressing the function of one therapeutic target, osteopontin, and methods for preventing the development of HCC in patients with chronic liver diseases by suppressing the function of one therapeutic target, EpCAM. Pharmaceutical compositions containing agents capable of inhibiting the functions of osteopontin or EpCAM are also disclosed.

Description

Methods of Diagnosing Potential for
Metastasis or Developing Hepatocellular Carcinoma and of Identifying Therapeutic Targets
CROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority to U.S. Provisional Patent Application No. 60/370,895, filed April 5, 2002, the entire contents of which are hereby incorporated by reference.
STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER
FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT [0002] This invention is owned by the United States of America as represented by the Secretary of Health and Human Services.
BACKGROUND OF THE INVENTION
[0003] Hepatocellular carcinoma (HCC) is one ofthe most common and aggressive malignancies worldwide with a curable rate of less than 5%. The high mortality is mainly due to the occurrence of intra-hepatic metastases. Little is Icnown about the molecular basis of intra-hepatic metastasis or about specific therapeutic targets in these patients. [0004] Within the past decade, several technologies have made it possible to monitor the expression level of a large number of transcripts at any one time (see, e.g., Schena et al., Science 270:467-470, 1995; Lockhart et al., Nature Biotechnology 14:1675-1680, 1996; Blanchard et al., Nature Biotechnology 14: 1649, 1996; and U.S. Pat. No. 5,569,588). In organisms for which the complete genome is known, it is possible to analyze the transcripts of all genes within the cell. With other organisms, such as human, for which there is an increasing knowledge of the genome, it is possible to simultaneously monitor large numbers of the genes within the cell. Such monitoring technologies have been applied to the identification of genes which are up regulated or down regulated in various diseased or physiological states, the analyses of members of signaling cellular states, and the identification of targets for various drugs. [0005] The present inventors analyzed the expression of 9,180 genes in HCC tissues from 40 patients without or with accompanying intra-hepatic metastases. Using a supervised machine learning algorithm to classify patients based on their gene expression signatures, a molecular signature has been generated for the first time that correctly classifies patients with or without metastases and have identifies genes that are mostly relevant to the prediction of outcome including patient survival. The gene expression signature of primary HCCs with accompanying metastasis is very similar to that of their corresponding metastases, suggesting that the genes favoring metastasis progression likely have been initiated in the primary tumors. Moreover, osteopontin (OPN) is overexpressed in primary HCC with intra-hepatic metastasis and a neutralizing antibody against osteopontin is shown to block invasion of highly metastatic HCC cells in an in vitro assay of invasion. These data identify osteopontin both as a diagnostic marker and a therapeutic target for metastatic HCC.
[0006] The expression of 9,180 genes has also been analyzed in tumor samples from 54 HCC patients and in 59 non-cancerous liver samples from patients with severe liver diseases and at high risk for developing HCC or at low risk for developing HCC. The high risk group includes patients diagnosed with hepatitis B, hepatitis C, hemochromatosis, and Wilson's disease. The low risk group includes patients diagnosed with alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis. A comparison ofthe gene expression levels between the high risk and low risk groups has identified a set of significant genes that would differentiate between the high risk and low risk groups. Filtering the set of significant genes using expression data from HCC samples has identified subsets of genes enriched with HCC-related molecular signatures and useful for classifying samples. In addition, EpCAM is among the most significant genes whose overexpression positively correlates to the risk of developing HCC in a patient with a severe liver disease and the inhibition of its expression has been shown to lead to growth suppression in HCC cells. Thus, EpCAM has been identified as a diagnostic marker for predicting the risk of developing HCC as well as a therapeutic target for preventing the onset of HCC in patients suffering from chronic liver diseases.
BRIEF SUMMARY OF THE INVENTION [0007] One aspect ofthe present invention relates to a method for identifying potential therapeutic targets for inhibiting metastasis in a patient suffering from HCC or for preventing the development of HCC in a patient suffering from a chronic liver disease. [0008] The method for identifying potential therapeutic targets for inhibiting metastasis in an HCC patient includes the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a metastatic HCC patient; b) capturing markers from the sample and generating a first signal; c) repeating steps a) and b) with a sample from a non-metastatic HCC patient and thereby generating a second signal; and d) comparing the first and second signals and thereby identifying a subset of cellular markers whose level is different in the first and second signals, wherein the subset of cellular markers are potential therapeutic targets for treating HCC metastasis in an HCC patient. In some embodiments, a signal generated from a normal non-cancerous sample on an array identical to the array of step a) is subtracted in steps b) and c) to generate the first and second signals.
[0009] The method for identifying potential therapeutic targets for preventing the onset of HCC in a patient with a chronic liver disease includes the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a patient with a chronic liver disease and a high risk of developing HCC; b) capturing markers from the sample and generating a first signal; c) repeating steps a) and b) with a sample from a patient with a chronic liver disease and a low risk of developing HCC and thereby generating a second signal; and d) comparing the first and second signals and thereby identifying a subset of cellular markers whose level is different in the first and second signals, wherein the subset of cellular markers are potential therapeutic target for preventing HCC in a patient with a chronic liver disease. In some embodiments, a signal generated from a normal non-cancerous sample on an array identical to the array of step a) is subtracted in steps b) and c) to generate the first and second signals.
[0010] Another aspect ofthe present invention relates to a method for predicting the metastatic potential in an HCC patient or for predicting the risk of developing HCC in a patient with a chronic liver disease.
[0011] The method for predicting the metastatic potential in an HCC patient includes the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a metastatic HCC patient, the set of cellular markers comprising at least ten genes or proteins encoded by genes independently selected from the genes of Table 2; b) capturing markers from the sample; c) generating a first signal from the captured markers of step b); d) repeating steps a) to c) with a sample from a non-metastatic HCC patient and thereby generating a second signal; e) repeating steps a) to c) with a sample from an HCC patient with unknown metastatic potential and thereby generating a third signal; and f) comparing the third signal to the first and the second signals and thereby determining the metastatic potential of the HCC patient of step e). In some embodiments, the set of cellular markers includes at least 20, preferably 50, more preferably 100, and most preferably all genes or proteins encoded by genes independently selected from the genes of Table 2. In other embodiments, the set of cellular markers includes the genes or proteins encoded by genes of Table 4 or Unigene numbers Hs.313, Hs.69707, Hs.222, Hs.63984, Hs.75573, Hs.177687, Hs.69707, Hs.222, Hs.323712, and Hs.63984. Preferably, the sample of steps a) and b), the sample of step d), and the sample of step e) are liver tissue extracts. In a preferred embodiment, the array of step a) is a genomic array. In another preferred embodiment, the array of step a) is a proteomic array.
[0012] The method for predicting the risk of developing HCC in a patient suffering from a chronic liver disease includes the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a patient with a chronic liver disease and a high risk of HCC, the set of cellular markers comprising at least ten genes or proteins encoded by genes independently selected from the genes of Table 5; b) capturing markers from the sample; c) generating a first signal from the captured markers of step b); d) repeating steps a) to c) with a sample from a patient with a chronic liver disease and a low risk of HCC and thereby generating a second signal; e) repeating steps a) to c) with a sample from a patient with a chronic liver disease and an unknown risk of HCC and thereby generating a third signal; and f) comparing the third signal to the first and the second signals and thereby determining the risk of developing HCC in the patient of step e). In some embodiments, the set of cellular markers comprises at least 20, preferably 50, more preferably 100, and most preferably all genes or proteins encoded by genes independently selected from the genes of Table 5. In some othe embodiments, the set of cellular markers comprises the genes or proteins encodec by genes of Table 6 or Table 7. Preferably, the sample of steps a) and b), the sample of step d), and the sample of step e) are liver tissue extracts. In one preferred embodiment, the array of step a) is a genomic array. In another preferred embodiment, the array of step a) is a proteomic array. In some embodiments, the patient with a high risk of developing HCC suffers from hepatitis B infection, hepatitis C, hemachromatosis, or Wilson's disease. In other embodiments, the patient with a low risk of HCC suffers from alcoholic liver disease, autoimmune hepatitis, or primary biliary cirrhosis. In yet other embodiments, the patient whose risk of developing HCC is being assessed suffers from hepatitis B, hepatitis C, hemochromatosis, Wilson's disease, alcoholic liver disease, autoimmune hepatitis, or primary biliary cirrhosis.
[0013] Yet another aspect ofthe invention relates to a method for inhibiting metastasis in an HCC patient as well as a method for inhibiting the development of HCC in a patient with a chronic liver disease. The method for inhibiting HCC metastasis in an HCC patient includes the step of suppressing OPN activity. In some embodiments, suppression of OPN activity is accomplished by inhibiting OPN expression, preferably using an antisense polynucleotide specific for OPN. In other embodiments, suppression of OPN activity is accomplished by inhibiting the specific binding between OPN and OPN receptor, preferably using an anti- OPN antibody. The method for preventing the onset of HCC in a patient with a chronic liver disease includes the step of suppressing EpCAM activity. In some embodiments, suppression of EpCAM activity is accomplished by inhibiting EpCAM expression, preferably using an antisense polynucleotide or a small inhibitory RNA molecule specific for EpCAM. In other embodiments, suppression of EpCAM activity is accomplished by inhibiting the specific binding between EpCAM and EpCAM receptor, preferably using an anti-EpCAM antibody.
[0014] A still further aspect ofthe present invention relates to a computer readable medium, a digital computer, and a system for accessing the metastatic potential in an HCC patient or the risk of developing HCC in a patient with a chronic liver disease.
[0015] The computer readable medium for assessing the metastatic potential in an HCC patient includes: a) code for a first data set, derived from a first signal from an array comprising capture reagents for a set of cellular markers after contact with a sample from a metastatic HCC patient, the set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 2; b) code for a second data set, derived from a second signal from an array identical to the array of a) after contact with a sample from a non-metastatic HCC patient; c) code for a third data set, derived from a third signal from an array identical to the array of a) after contact with a sample from a HCC patient with unknown metastatic potential; and d) code for comparing the third data set with the first and second data sets. A digital computer containing the claimed computer readable medium for assessing HCC metastatic potential in an HCC patient is also provided. Further provided is a system containing such a digital computer, a chip with an array comprising capture reagents for a set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 2, and a reader capable of registering a signal from the array after contact with a sample.
[0016] The computer readable medium for assessing the risk of developing HCC in a patient with a chronic liver disease includes: a) code for a first data set, derived from a first signal from an array comprising capture reagents for a set of cellular markers after contact with a sample from a patient with a chronic liver disease and a high risk of HCC, the set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 5; b) code for a second data set, derived from a second signal from an array identical to the array of a) after contact with a sample from a patient with a chronic liver disease and a low risk of HCC; c) code for a third data set, derived from a third signal from an array identical to the array of a) after contact with a sample from a patient with a chronic liver disease and an unknown risk of HCC; and d) code for comparing the third data set with the first and second data sets. A digital computer containing the claimed computer readable medium for assessing the risk of develop HCC in a patient with a chronic liver disease is also provided. Further provided is a system containing such a digital computer, a chip with an array comprising capture reagents for a set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 5, and a reader capable of registering a signal from the array after contact with a sample.
DEFINITIONS
[0017] Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many ofthe terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); Ebe Cambridge Dictionary of Science and Technology (Walker ed., 1988); Ebe Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, Ebe Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
[0018] The term "hepatocellular carcinoma" or "HCC" as used herein refer to the major type of carcinoma ofthe liver that accounts for more than 90% of all primary liver cancers. Hepatocellular carcinomas range from well differentiated to highly anaplastic undifferentiated lesions. Hepatocellular carcinomas may exist as single intra-hepatic lesions (non-metastatic), multifocal intra-hepatic metastasis or as extra-hepatic metastasis.
[0019] "High risk precancerous diseases" refer to a group of epidemiologically defined diseases that are associated with a high probability of developing HCC. These diseases include chronic hepatitis B infection, hepatitis C infection, hemochromatosis, and Wilson's disease.
[0020] "Low risk precancerous diseases" refer to a group of epidemiologically defined diseases, that are associated with a low risk of developing HCC. These diseases include alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis. [0021] The term "metastasis" or "metastatic" refers to the ability of a cancer cell to invade surrounding tissues, to enter the circulatory system and to establish malignant growths at new sites.
[0022] "Non-Metastatic" refers to tumors that do not spread beyond their original site of development and specifically do not enter the circulatory system and establish malignant growths at new sites.
[0023] The term "non-cancerous" refers to a biological sample or tissue sample in which the cells in the sample exhibit a normal or non-pathological phenotype when analyzed visually, by microscope, immunohistologically, immunologically, or molecularly using antibody or nucleic acid probes designed to detect pathological conditions. [0024] The term "normal" refers to a biological sample or tissue sample in which the sample is obtained from an individual who has not been diagnosed with HCC or high risk, or low risk precancerous diseases.
[0025] The term "capture reagent" refers to any type of moiety that binds to a specific nucleic acid or protein marker. Typically the binding of the marker to the capture reagent can be controlled by the conditions used during the binding process. For example, the binding of a nucleic acid marker to a cognate oligonucleotide is controlled by the hybridization conditions used. Stringent hybridizations conditions will only allow a nucleic acid marker that has high homology e.g. 95%-100% identity with the oligonucleotide to bind to the oligonucleotide. [0026] "Array" refers to a plurality of capture reagents bound to a substrate, e.g., a solid support, which will bind to their cognate markers. For example, the array may be composed of nucleic acid molecules, protein molecules or any other reagent that will specifically bind a nucleic acid, protein or polypeptide isolated from a biological sample. The capture reagents are preferentially bound in an addressable fashion such that when the cognate marker is bound to the capture reagent, the amount of binding may be quantified. [0027] "DNA microarray" refers to an array in which the capture reagents are nucleic acid molecules. Typically, a DNA microarray is composed of DNA oligonucleotides of a defined length which can hybridize to DNA, cDNA or RNA molecules under defined conditions. DNA oligonucleotides may be short pieces of nucleic acid ranging is size from 15-50 bases or they may be longer pieces of nucleic acids ranging in size from 500-1000 bases or longer. DNA microarrays may be composed of hundreds or thousands of different nucleic acid molecules each of which is located on the array in a defined position. Binding ofthe marker to the DNA microarray is usually quantified when the marker is labeled with a detectable moiety. The term DNA microarray is used interchangeably with the term "genomic array"
[0028] "Protein array" refers to an array in which the capture reagents will bind protein markers. Typically these reagents may be polyclonal or monoclonal antibodies that bind specific proteins. Alternatively, any protein, peptide, nucleic acid or other molecule or surface which will specifically bind to a protein may be used in a protein array. These arrays usually contain hundreds or thousands of different capture reagents in addressable locations. Binding ofthe markers to the capture reagent on the protein array is usually quantified when the marker is labeled with a detectable moiety. The term protein array is used interchangeably with "proteomic array".
[0029] "Gene expression profile" refers to the all ofthe genes that are expressed in a tissue sample compared to a reference sample. The level of gene expression of genes in a gene expression profile is determined by comparing the level of expression in a test sample e.g. an HCC tumor sample or a sample obtained from a patient diagnosed with severe liver disease to the level of expression in a reference sample. The reference sample used for determining the metastatic potential of an HCC tumor is non-cancerous liver tissue or liver tissue obtained from a patient who has not been diagnosed with HCC. The reference sample used for determining the potential for developing HCC in patients diagnosed with severe liver disease is liver tissue obtained from patients who have not been diagnosed with severe liver disease. Genes in the test sample may be over expressed or under expressed relative to the reference sample. [0030] "Metastatic gene expression predictor" refers to the expression of a specific cluster of genes correlated with the diagnosis of metastatic HCC. The metastatic gene expression predictor is generated by comparing the gene expression profile of a test sample obtained from a non-metastatic HCC sample to the gene expression profile obtained from a metastatic HCC sample followed by a cluster and classification analysis using a defined algorithm or set of algorithms. The number of genes present may vary depending on the clustering algorithm used or depending on a parameter in the algorithm e.g. p-level = 0.001 vs. 0.022.
[0031] "HCC gene expression predictor" refers to the expression of a specific cluster of genes correlated with the diagnosis of patients likely to develop HCC. The HCC gene expression predictor is generated by comparing the gene expression profile of a test sample obtained from a non-metastatic liver sample obtained from a patient with a high risk for developing HCC to the gene expression profile obtained from a non-metastatic liver sample obtained from a patient having a low risk of developing HCC followed by a cluster and classification analysis using a defined algorithm or set of algorithms. The number of genes present may vary depending on the clustering algorithm used or depending on a parameter in the algorithm e.g. p-level = 0.001 vs. 0.022.
[0032] "UG Cluster" used in Tables 2-7 refers to the UniGene data base compiled by the National Center for Biological Information ("NCBI"). Each accession number in the UniGene data base is a compilation of all ofthe nucleotide and amino acid sequence data available for a specific nucleotide sequence. For example, each UG Cluster accession number may provide links to GeneBank or other data base which in turn provide nucleotide sequences encoding a partial or full length cDNA for a gene. Alternatively the links may provide genomic or EST sequence data or amino acid sequence information. Each UG Cluster accession number provides unique sequence information for the specific gene, nucleic acid or amino acid sequence identified.
[0033] "Ostoepontin" refers to a secreted phosphoprotein encoded by SEQ ID NO: 1 or a conservative variant thereof, which may also be found in Genbank accession number NM_000582. Nucleic acid and amino acid sequence information may also be found in the National Center for Biological Information ("NCBI") UniGene data base under accession number Hs.313 at NCBI web site. This site lists 9 mRNA/genomic DNA sequences and over 900 expressed sequence tags. Osteopontin is an extracellular protein associated with the bone matrix and associated with atherosclerotic plaques. Full length osteopontin protein contains an RGD amino acid sequence that functions as an integrin binding site. Osteopontin is a major ligand for the vitronectin receptor. "OPN" is used interchangeably with osteopontin and refers either to the protein, the gene encoding the protein or fragments thereof.
[0034] "EpCAM" is a 40 kDa glycoprotein that functions as an Epithelial Cell Adhesion Molecule. It is also identified as tumor-associated calcium signal transducer or TACSTD1, with a Unigene Cluster number of Hs.692. EpCAM is encoded by the GA733-2 gene, which is located on human chromosome 4q. A transmembrane protein expressed in cells of epithelial origin, EpCAM mediates Ca2+-independent homotypic cell-cell adhesion and is specifically recognized by a number of well known monoclonal antibodies (mAb), such as 17-1A, 323/A3, KS1/4, GA733, MOC31, etc.
[0035] The term "Marker" in the context ofthe present invention refers to a nucleic acid sequence or a gene encoding a polypeptide (of a particular apparent molecular weight) which is differentially present in a sample taken from patients having metastatic HCC or a predisposition for HCC as compared to a comparable sample taken from control subjects (e.g., a person with non-metastatic HCC or a negative diagnosis or undetectable cancer, normal or healthy subject). Marker may also refer to a polypeptide or protein encoded by a nucleic acid sequence or gene which is differentially present in a sample taken from patients having metastatic HCC or a predisposition for HCC as compared to a comparable sample taken from control subjects (e.g., a person with non-metastatic HCC or a negative diagnosis or undetectable cancer, normal or healthy subject). Markers ofthe present invention include the genes and their encoded proteins identified by UG Cluster number in Tables 2-7 infra.
[0036] The term "sample" as used herein is a sample of biological tissue or fluid that will be used to determine a gene expression profile, a source of markers, or that contains a protein of interest (such as osteopontin or EpCAM) or a nucleic acid encoding such protein. Such samples include, but are not limited to, various types of tissue isolated from humans, and may also include sections of tissues such as frozen sections or paraffin sections taken for histological purposes. Tissues include liver samples and fluid samples include blood, serum, plasma, urine, and other bodily fluids. A preferred sample used for practicing the present invention is a lysate of cells extracted from a tissue of interest, e.g., liver. Such a cell lysate may be prepared using a variety of methods known to those skilled in the art, depending on the form in which a cellular marker is to be detected and examined, e.g., as a nucleic acid such as mRNA, as a protein, or as a molecule with other measurable biological characteristics such as an enzymatic activity.
[0037] The phrase "functional effects" in the context of assays for testing compounds that regulate the biological activity of a protein of interest, e.g., osteopontin or EpCAM, includes the determination of any parameter that is directly or indirectly related to or under the influence of OPN or EpCAM, such as the level of mRNA encoding the proteins, the level of the proteins, as well as their functional, physical, and chemical effects (e.g., their ability to specifically interact with their naturally binding partners, such as other proteins, nucleic acids, or any other molecules, their ability to mediate signal transduction that may affect cellular events such as cell proliferation, differentiation, apoptosis, secretion, adhesion, and the like).
[0038] "Nucleic acid" refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form. The term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2- O-methyl ribonucleotides, peptide-nucleic acids (PNAs). The term encompasses nucleic acids isolated from biological samples and synthetic oligonucleotides.
[0039] Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g. , degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081, 1991; Ohtsuka et al., J. Biol. Chem. 260:2605-2608, 1985; Rossolini et al., Mol. Cell. Probes 8:91-98, 1994). The term nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and polynucleotide. [0040] The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non- naturally occurring amino acid polymer.
[0041] The term "amino acid" refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g. , hydroxyproline, γ- carboxyglutamate, and O-phosphoserine. Amino acid analogs refer to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Amino acid mimetics refer to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid. [0042] Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
[0043] "Conservatively modified variants" applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences. Because ofthe degeneracy ofthe genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any ofthe corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are "silent variations," which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation ofthe nucleic acid. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide is implicit in each described sequence.
[0044] As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a "conservatively modified variant" where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Such conservatively modified variants are in addition to and do not exclude polymoφhic variants, interspecies homologs, and alleles ofthe invention.
[0045] The following eight groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A), Glycine (G);
2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q);
4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);
7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M)
(see, e.g., Creighton, Proteins, 1984).
[0046] Macromolecular structures such as polypeptide structures can be described in terms of various levels of organization. For a general discussion of this organization, see, e.g., Alberts et al., Molecular Biology ofthe Cell (3rd ed., 1994) and Cantor and Schimmel, Biophysical Chemistry Part I: The Conformation of Biological Macromolecules (1980). "Primary structure" refers to the amino acid sequence of a particular peptide. "Secondary structure" refers to locally ordered, three dimensional structures within a polypeptide. These structures are commonly known as domains. Domains are portions of a polypeptide that form a compact unit of the polypeptide and are typically 50 to 350 amino acids long. Typical domains are made up of sections of lesser organization such as stretches of β-sheet and α- helices. "Tertiary structure" refers to the complete three dimensional structure of a polypeptide monomer. "Quaternary structure" refers to the three dimensional structure formed by the noncovalent association of independent tertiary units. Anisotropic terms are also known as energy terms.
[0047] "Antibody" refers to a polypeptide comprising a framework region from an immunoglobulin gene or fragments thereof that specifically binds and recognizes an antigen. The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon, and mu constant region genes, as well as the myriad immunoglobulin variable region genes. Light chains are classified as either kappa or lambda. Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, IgD and IgE, respectively. [0048] An exemplary immunoglobulin (antibody) structural unit comprises a tetramer.
Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one "light" (about 25 kDa) and one "heavy" chain (about 50-70 kDa). The N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition. The terms variable light chain (VL) and variable heavy chain (VH) refer to these light and heavy chains respectively.
[0049] Antibodies exist, e.g., as intact immunoglobulins or as a number of well- characterized fragments produced by digestion with various peptidases. Thus, for example, pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)'2, a dimer of Fab which itself is a light chain joined to VH-CH1 by a disulfide bond. The F(ab)'2 may be reduced under mild conditions to break the disulfide linkage in the hinge region, thereby converting the F(ab)'2 dimer into an Fab' monomer. The Fab' monomer is essentially Fab with part of the hinge region (see Fundamental Immunology (Paul ed., 3d ed. 1993). While various antibody fragments are defined in terms ofthe digestion of an intact antibody, one of skill will appreciate that such fragments may be synthesized de novo either chemically or by using recombinant DNA methodology. Thus, the term antibody, as used herein, also includes antibody fragments either produced by the modification of whole antibodies, or those synthesized de novo using recombinant DNA methodologies (e.g., single chain Fv) or those identified using phage display libraries (see, e.g., McCafferty et al., Nature 348:552-554, 1990). [0050] For preparation of monoclonal or polyclonal antibodies, any technique known in the art can be used (see, e.g., Kohler & Milstein, Nature 256:495-497 (1975); Kozbor et al., Immunology Today 4: 72 (1983); Cole et al., pp. 77-96 in Monoclonal Antibodies and Cancer Therapy (1985)). Techniques for the production of single chain antibodies (U.S. Patent 4,946,778) can be adapted to produce antibodies to polypeptides of this invention. Also, transgenic mice, or other organisms such as other mammals, may be used to express humanized antibodies. Alternatively, phage display technology can be used to identify antibodies and heteromeric Fab fragments that specifically bind to selected antigens (see, e.g., McCafferty et al, supra; Marks et al, Biotechnology 10:779-783, 1992).
[0051] A "chimeric antibody" is an antibody molecule in which (a) the constant region, or a portion thereof, is altered, replaced or exchanged so that the antigen binding site (variable region) is linked to a constant region of a different or altered class, effector function and/or species, or an entirely different molecule which confers new properties to the chimeric antibody, e.g., an enzyme, toxin, hormone, growth factor, drug, etc.; or (b) the variable region, or a portion thereof, is altered, replaced or exchanged with a variable region having a different or altered antigen specificity.
[0052] An "anti-OPN antibody" is an antibody or antibody fragment that specifically binds a polypeptide encoded by the OPN gene, cDNA, or a subsequence thereof. An anti-EpCAM antibody is defined in a similar fashion.
[0053] A "receptor" as used herein encompasses any molecule that a particular protein, e.g., OPN or EpCAM, can specifically bind and may thus include proteins, nucleic acids, carbohydrates, or any other molecules. [0054] The term "immunoassay" is an assay that uses an antibody to specifically bind an antigen. The immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen.
[0055] The phrase "specifically (or selectively) binds" to an antibody or "specifically (or selectively) immunoreactive with," when referring to a protein or peptide, refers to a binding reaction that is determinative ofthe presence ofthe protein in a heterogeneous population of proteins and other biologies. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein at least two times the background and do not substantially bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, polyclonal antibodies raised to OPN from specific species such as rat, murine, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with OPN and not with other proteins, except for polymorphic variants and alleles of OPN. This selection may be achieved by subtracting out antibodies that cross-react with OPN molecules from other species. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual, 1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity). Typically a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background. [0056] The phrase "differentially present" refers to differences in the quantity and/or the frequency of a marker present in a sample taken from a metastatic HCC tumor or liver samples of a patient at high risk for HCC as compared to a non-metastatic HCC sample or a liver sample from a patient at low risk for HCC respectively. For examples, a marker can be a polypeptide or nucleic acid which is present at an elevated level or at a decreased level in samples of metastatic HCC tumors or liver samples of someone at high risk for HCC compared to non-metastatic HCC samples or a liver sample from a patient at low risk for HCC respectively. Alternatively, a marker can be a polypeptide which is detected at a higher frequency or at a lower frequency in metastatic HCC tumors or liver samples of someone at high risk for HCC compared to non-metastatic HCC sample or a liver sample from a patient at low risk for HCC respectively. A marker can be differentially present in terms of quantity, frequency or both.
[0057] A polypeptide or nucleic acid is differentially present between the two samples if the amount ofthe polypeptide in one sample is statistically significantly different from the amount ofthe polypeptide in the other sample. For example, a polypeptide is differentially present between the two samples if it is present at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900%, or at least about 1000% greater than it is present in the other sample, or if it is detectable in one sample and not detectable in the other. [0058] Alternatively or additionally, a polypeptide is differentially present between the two sets of samples if the frequency of detecting the polypeptide in the metastatic HCC tumors or liver samples of someone at high risk for HCC is statistically significantly higher or lower than in non-metastatic HCC samples or a liver sample from a patient at low risk for HCC respectively. For example, a polypeptide is differentially present between the two sets of samples if it is detected at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900%, or at least about 1000% more frequently or less frequently observed in one set of samples than the other set of samples.
[0059] "Diagnostic" means identifying the presence or nature of a pathologic condition or a predisposition for a pathologic condition such as HCC or HCC metastasis. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives").
Diseased individuals not detected by the assay are "false negatives." Subjects who are not diseased and who test negative in the assay, are termed "true negatives." The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive" rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis.
[0060] A "test amount" of a marker refers to an amount of a marker present in a sample being tested. A test amount can be either in absolute amount (e.g., μg/ml) or a relative amount (e.g., relative intensity of signals). [0061] A "diagnostic amount" of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of metastatic HCC tumors or tissue samples of someone at high risk for HCC. A diagnostic amount can be either in absolute amount (e.g., μg/ml) or a relative amount (e.g., relative intensity of signals).
[0062] A "control amount" of a marker can be any amount or a range of amount which is to be compared against a test amount of a marker. For example, a control amount of a marker can be the amount of a marker in a person without metastatic HCC tumors or tissue samples of someone at low risk for HCC. A control amount can be either in absolute amount (e.g., μg/ml) or a relative amount (e.g., relative intensity of signals).
[0063] "Spectrometer probe" refers to a device that is removably insertable into a gas phase ion spectrometer and comprises a substrate having a surface for presenting a marker for detection. A spectrometer probe can comprise a single substrate or a plurality of substrates. Terms such as ProteinChip®, ProteinChip® array, or chip are also used herein to refer to specific kinds of spectrometer probes.
[0064] "Substrate" or "probe substrate" refers to a solid phase onto which an adsorbent can be provided (e.g., by attachment, deposition, etc.). [0065] "Adsorbent" refers to any material capable of adsorbing a marker. The term
"adsorbent" is used herein to refer both to a single material ("monoplex adsorbent") (e.g., a compound or functional group) to which the marker is exposed, and to a plurality of different materials ("multiplex adsorbent") to which the marker is exposed. The adsorbent materials in a multiplex adsorbent are referred to as "adsorbent species." For example, an addressable location on a probe substrate can comprise a multiplex adsorbent characterized by many different adsorbent species (e.g., anion exchange materials, metal chelators, or antibodies), having different binding characteristics. Substrate material itself can also contribute to adsorbing a marker and may be considered part of an "adsorbent."
[0066] "Adsorption" or "retention" refers to the detectable binding between an absorbent and a marker either before or after washing with an eluant (selectivity threshold modifier) or a washing solution.
[0067] "Eluant" or "washing solution" refers to an agent that can be used to mediate adsorption of a marker to an adsorbent. Eluants and washing solutions are also referred to as "selectivity threshold modifiers." Eluants and washing solutions can be used to wash and remove unbound materials from the probe substrate surface.
[0068] "Resolve," "resolution," or "resolution of marker" refers to the detection of at least one marker in a sample. Resolution includes the detection of a plurality of markers in a sample by separation and subsequent differential detection. Resolution does not require the complete separation of one or more markers from all other biomolecules in a mixture. Rather, any separation that allows the distinction between at least one marker and other biomolecules suffices.
[0069] "Gas phase ion spectrometer" refers to an apparatus that measures a parameter which can be translated into mass-to-charge ratios of ions formed when a sample is volatilized and ionized. Generally ions of interest bear a single charge, and mass-to-charge ratios are often simply referred to as mass. Gas phase ion spectrometers include, for example, mass spectrometers, ion mobility spectrometers, and total ion current measuring devices. [0070] "Mass spectrometer" refers to a gas phase ion spectrometer that includes an inlet system, an ionization source, an ion optic assembly, a mass analyzer, and a detector.
[0071] "Laser desorption mass spectrometer" refers to a mass spectrometer which uses laser as means to desorb, volatilize, and ionize an analyte. [0072] "Detect" refers to identifying the presence, absence, or amount ofthe object to be detected.
[0073] "Detectable moiety" or a "label" refers to a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For example, useful labels include 32P, 35S, fluorescent dyes, electron-dense reagents, enzymes (such as those commonly used in an ELISA, e.g. , horseradish peroxidase), biotin- streptavidin, digoxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target. The detectable moiety often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound detectable moiety in a sample. Quantitation ofthe signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry.
[0074] The term "activity" as used in the application refers to the biological functions of a molecule, such as a protein encoded by a gene of interest, e.g., osteopontin or EpCAM. This term encompasses biological functions such as enzymatic activity, specific interaction with other molecules, regulatory effects on biological events at molecular or cellular level, and the like.
[0075] The term "inhibiting" or "inhibition" as used herein refers to a negative regulatory effect on the function or activity of an intended target molecule, such that the function or activity, e.g., enzymatic activity or specific interaction with other molecules, is detectably diminished or effectively abolished.
[0076] The term "antagonist" as used herein refers to a compound that is capable of negatively regulating the biological activity of a target molecule, e.g., osteopontin or EpCAM. An antagonist may effectuate the negative regulation by various means, such as by suppression ofthe expression ofthe target gene at transcriptional or translational level, or by interfering with the target molecule in its specific interaction with other molecules. [0077] The term "antisense" as used in the context of describing a polynucleotide, refers to a single-stranded nucleic acid having a nucleotide sequence complementary to at least a portion of a target nucleic acid that encodes a protein of interest (e.g., osteopontin, or EpCAM), or the "sense" sequence. Complementarity between two single-stranded polynucleotides is based on the "A-T G-C" base-pairing rule. For example, the sequence "5'- AGAT-3'," is complementary to the sequence "5'-ATCT-3"'. Complementarity between a target nucleic acid and its antisense polynucleotide is typically 100%, i.e., all bases ofthe antisense polynucleotide match the with the bases ofthe target nucleic acid, but may be of varying degrees, i.e., there are may be some mis-matched bases. The degree of complementarity between a target nucleic acid and its antisense polynucleotide has significant effects on the efficiency and strength of hybridization. An "antisense" polynucleotide sequence in the present application may correspond to a coding portion (i.e., exon) or a non-coding portion (i.e., intron) ofthe target nucleic acid.
BRIEF DESCRIPTION OF THE DRAWINGS [0078] Figure 1. Classification of hepatocellular carcinoma with or without metastasis by gene expression. A) Multidimensional scaling analysis of 50 primary and metastatic HCC samples using 143 significant genes (p<0.0005) from supervised class comparison analysis of all 5 clinical groups, i.e., P, P-M, PT, PT-M, PN. The axes represent the first three principal components of these genes. P, primary HCC with intra-hepatic spreads; P-M, metastatic lesion of P; PT, primary HCC with tumor thrombus in portal vein; PN, metastasis-free primary HCC samples. B) Hierarchical clustering of 30 primary HCC samples from P, PT, and PN groups using 383 significant genes (p<0.0005) derived from supervised class comparison.
[0079] Figure 2. Prediction of metastasis and survival with metastasis predictor model derived from "leave-one-out' cross-validated compound covariate predictor classification. A) Metastasis predictor model used in 40 training and testing HCC patients. The predictor was based on a training set (circle) including 10 PN and 10 PT primary HCC samples that were previously used in the compound covariate predictor classification and 20 primary blinded HCC samples that were not used in the training procedure. The predictor uses 153 significant genes that distinguish between these two groups. B) Multidimensional scaling analysis of 40 primary HCC samples using 153 significant genes from the predictor. Patient IDs are indicated. C) Kaplan-Meier survival curves for 40 PN, PT and P patients. Cross marks indicate time of censorship.
[0080] Figure 3. Candidate genes associated with metastatic HCC. A) Hierarchical clustering of top 30 candidate genes whose expressions were altered largely in PT and PT-M, but rarely in PN. Each row represents an individual gene and each column represents an individual tumor sample. Genes were ordered by centered correlation and complete linkage according the ratio of its abundance to the median abundance of all genes among all tumor samples. Pseudo colors indicate differential expression: green squares, transcript levels below the median; black squares, transcript levels equal to the median; red squares, transcript levels greater than the median; gray squares, missing data. Dendrogram was based on 10 primary PN (green) and 10 primary PT (red) samples. B) Relative expression ratio of OPN by cDNA microarray analysis in 10 primary PN samples (green bars) and 10 primary PT samples (red bars) with accompanying metastasis (black bars). C and D) Semi-quantitative RT-PCR analysis of OPN mRNA level in primary HCC samples with or without metastasis. [0081] Figure 4. Immunohistochemical analysis of osteopontin in normal liver and hepatocellular carcinoma. Primary tumor cells (tumor S30) show cytoplasmic osteopontin immunoreactivity, especially in the area with high density of vasculature (panels b and d), but fibrous septa region (panels b and d) or normal liver parenchyma cells show no reactivity (panels a and c; normal liver 914). Magnification, x50. (H&E, x50). [0082] Figure 5. Role of osteopontin in promoting HCC metastasis. A) The level of osteopontin of CCL13, SK-Hep-1, and Hep3B cells was determined by Western blotting with a rat monoclonal anti-OPN antibody. A monoclonal β-actin antibody was used as internal control. Densitometry was used to quantify the amount of OPN, which was normalized to actin. OPN level is indicated as relative folds. B) CCL13, SK-Hep-1 or Hep3B cells were incubated with or without a murine recombinant osteopontin protein or a neutralizing antibody against osteopontin and their invasiveness was determined by the Matrigel Basement Membrane Cell Invasion Chamber. Data is an average of triplicate determinants for each condition and is expressed as the mean percent invasion (plus one standard deviation) through the Matrigel Matrix and membrane (matrigel chamber) relative to the migration through the control membrane (control chamber). C) The invasiveness of five additional HCC cell lines (SMMC7721, MHCC97, HuHl, HuH4 and HuH7) through matrigel matrix in responding to osteopontin neutralizing antibody was determined as above. D) Representative lung tissue sections (H&E stain; magnification xlOO) from mice at 35 days following s.c. injection of HCCLM3 cells without (upper panel) or with (bottom panel) anti- OPN neutralizing antibody are shown. Arrows indicate the tumor grades. E) Primary tumor size was monitored at various weeks following s.c. injection of HCCLM3 cells into nude mice. Data are an average of 10 mice. F) The formation of pulmonary metastases in nude mice was determined at 35 days following s.c. injection of HCCLM3 cells with or without anti-OPN antibody. The number of metastatic foci was quantified based on their grades. Data are an average of 10 mice per group. The groups with significant p values (<0.05) are indicated by the asterisk. [0083] Figure 6. Potential oncogenic role of EpCAM in HCC development, a) and b) The expression level of EpCAM in various chronic liver disease (CLD) liver samples as analyzed by microarray (a) or RT-PCR (b). c) EpCAM expression in cells from normal human fibroblasts (NHF-hTERT), normal liver (CCL13) and hepatoma (SK-Hep-1, Hep3B, Huhl, Huh4, Huh7, and HepG2) was analyzed by western blotting with a monoclonal antibody against EpCAM. A monoclonal antibody against beta-actin was used as an internal control, d) Cell proliferation of Hep3B, Huhl, and Huh4 cells was determined by MTT assay and data were an average of 3 independent experiments, e) Effective silencing of EpCAM expression by siRNA was determined by western blotting analysis, f) Growth inhibition of Hep3B cells by EpCAM siRNA as determined by MTT assay.
DETAILED DESCRIPTION OF THE INVENTION [0084] Hepatocellular carcinoma (HCC) is one ofthe most common and aggressive malignant tumors in the world, with high prevalence especially in Asia and Africa, and relatively low prevalence in Europe and North America (Parkin et al., CA Cancer J. Clin. 49:33-64, 1999; Pisani et al, Int. J. Cancer 83:18-29, 1999). Recent studies indicate that the incidence of HCC in the U.S. and in the U.K. has significantly increased over the last two decades (Taylor-Robinson et al., Lancet 350:1142-1143, 1997; El-Serag and Mason, N. Eng. J. Med. 340:745-750, 1999). Most ofthe HCC patients are incurable due to their poor prognosis. Although routine screening of individuals who are at the risk for developing HCC may provide an opportunity for some patients with an extended life, many patients are still diagnosed with advanced HCC with little improved survival (see, e.g. , Yang et al, J. Cancer Res. Clin. Oncol. 123:357-360, 1997; Izzo et al., Ann. Surg. 227:513-518, 1998). While a small subset of HCC patients qualifies for surgical intervention, the improvement on long- term survival is only modest. The extremely poor prognosis of HCC is largely because of a high rate of recuπence after surgery, or intra-hepatic metastases that develop by invasion of the portal vein or spreading to other parts of the liver, whereas extrahepatic metastases are less common (see, e.g., Genda et al., Hepatology 30:1027-1036, 1999). These data indicate that the liver is the main target organ of HCC metastasis. It has been demonstrated in animal model systems as well as in patients that the portal vein is the main route for intrahepatic metastases of metastatic HCC cells (see, e.g., Mitsunobu et al., Clin. Exp. Metastasis 14:520- 529, 1996). This specific feature of HCC underscores the need to develop an accurate molecular profiling model for better diagnosis and therapeutic targets for the treatment of HCC patients with intrahepatic metastases.
[0085] Current studies have largely been focused on individual candidate genes (see, e.g., Osada et al., Hepatology 24: 1460-1467, 1996; Guo et al., Hepatology 28:1481-1488, 1998; Hui et al., Int. J. Cancer 84:604-608, 1999), which may be insufficient to reflect the precise biological nature of metastatic HCC. The microarray technology has offered an opportunity to probe disease-related gene expressions at a global genome scale (see, e.g., Schena et al., Science 270:467-470, 1995). This approach has allowed the successful molecular classification of several human malignant tumors in regarding their stage, prognostic outcome, or response to therapy (Alizadeh et al., Nature 403:503-511, 2000; Bittner et al., Nature 406:536-540, 2000; Perou et al., Nature 406:747-752, 2000; Khan et al., Nat. Med. 7:673-679, 2001 ; Pomeroy et al., Nature 415: 436-442, 2002; Shipp et al., Nat. Med. 8:68-74, 2002). A few reports have dealt with the gene expression profiles of primary HCC samples (Okabe et al, Cancer Res. 61:2129-2137, 2001; Xu et al., Proc. Natl. Acad. Sci. U.S.A. 98:15089-15094, 2001). However, little is known about the molecular signatures associated with a poor prognostic feature of patients with metastatic HCC. [0086] Using cDΝA microarray-based gene expression profiling, the global changes associated with metastasis are investigated. The initial goal was to identify genes that can discriminate primary tumors from their matched intra-hepatic metastatic lesions. It is revealed that intrahepatic metastatic lesions are indistinguishable from their primary tumors, regardless of tumor size, encapsulation, and patient's age, whereas primary metastasis-free HCC is distinct from primary HCC with metastasis. These data indicate that changes favoring intrahepatic metastasis are initiated in the primary HCC. Moreover, an important gene, osteopontin, a secreted phosphoprotein, emerges in HCC metastasis. Osteopontin overexpression correlated with primary HCC with metastatic potential and invasiveness of liver tumor-derived cell lines in vitro, and an osteopontin-neutralizing antibody efficiently blocked in vitro invasion and in vivo pulmonary metastasis of HCC cells. These studies identify osteopontin both as a molecular marker for defining HCC patients with metastatic potential and as a potential therapeutic target for treating metastatic HCC. [0087] A similar approach is used to develop a gene expression prediction model for the potential to develop HCC in patients with chronic liver diseases. By comparing the gene expression profiles of patients epidemiologically at high risk for developing HCC with the gene expression profile of patients epidemiologically at low risk for developing HCC, cellular markers are identified so as to allow the identification of individuals with chronic liver diseases at high risk for developing HCC. The patients with severe liver diseases include those diagnosed with chronic hepatitis B infection, hepatitis C infection, hemochromatosis, Wilson's disease, alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis. High risk precancerous diseases include chronic hepatitis B infection, hepatitis C infection, hemochromatosis, and Wilson's disease. Low risk precancerous diseases include alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis. One gene identified to be associated with elevated risk of developing HCC in patients with severe liver diseases is EpCAM. Growth suppression of liver cancer cells has been observed upon inhibition of EpCAM expression, identifying its important role in HCC development and as a therapeutic target for preventing HCC in patients with chronic liver diseases.
[0088] One particular aspect ofthe invention provides methods for clustering co-regulated genes in patients suspected of having metastatic HCC or the potential to develop HCC into gene expression profiles. This section provides a more detailed discussion of methods for clustering co-regulated genes.
I. DNA MICROARRAY ANALYSIS
A. Gene expression proΩle Classification by Cluster Analysis
[0089] For many applications ofthe present invention, it is desirable to find basis gene expression profiles that are co-regulated in the non-metastatic HCC samples, the metastatic HCC samples, the high risk for developing HCC samples and the low risk for developing HCC samples. A preferred embodiment for identifying such basis gene expression profiles involves clustering algorithms (for reviews of clustering algorithms, see, e.g., Fukunaga, 1990, Statistical Pattern Recognition, 2nd Ed., Academic Press, San Diego; Everitt, 1974, Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms, New York: Wiley; Sneath and Sokal, 1973, Numerical Taxonomy, Freeman; Anderberg, 1973, Cluster Analysis for Applications, Academic Press: New York).
[0090] In some embodiments employing cluster analysis, the expression of a large number of genes is monitored in biological samples obtained from different sources A table of data containing the gene expression measurements is used for cluster analysis. Cluster analysis operates on a table of data which has the dimension m x k wherein m is the total number of conditions or perturbations and k is the number of genes measured.
[0091] A number of clustering algorithms are useful for clustering analysis. Clustering algorithms use dissimilarities or distances between objects when forming clusters. In some embodiments, the distance used is Euclidean distance in multidimensional space. The Euclidean distance may be squared to place progressively greater weight on objects that are further apart. Alternatively, the distance measure may be the Manhattan distance. In other embodiments unsupervised hierarchical clustering of a table of data may be performed using the CLUSTER or TREEVIEW software (Eisen et al., Proc. Natl. Acad. Sci. U.S.A. 95: 14863- 14868, 1998) using median centered correlation and complete linkage.
[0092] Various cluster linkage rules are useful for the methods ofthe invention. Single linkage, a nearest neighbor method, determines the distance between the two closest objects. By contrast, complete linkage methods determine distance by the greatest distance between any two objects in the different clusters. This method is particularly useful in cases when genes or other cellular constituents form naturally distinct "clumps." Alternatively, the unweighted pair-group average defines distance as the average distance between all pairs of objects in two different clusters. This method is also very useful for clustering genes or other cellular constituents to form naturally distinct "clumps." Finally, the weighted pair-group average method may also be used. This method is the same as the unweighted pair-group average method except that the size ofthe respective clusters is used as a weight. This method is particularly useful for embodiments where the cluster size is suspected to be greatly varied (Sneath and Sokal, 1973, Numerical taxonomy, San Francisco. W. H. Freeman & Co.). Other cluster linkage rules, such as the unweighted and weighted pair-group centroid and Ward's method are also useful for some embodiments ofthe invention. See., e g, Ward, 1963, J. Am. StatAssn. 58:236; Hartigan, 1975, Clustering algorithms, New York: Wiley. [0093] In one particularly preferred embodiment, the cluster analysis used is the BRB- ArrayTools software, an integrated package for the visualization and statistical analysis of cDNA microarray gene expression data developed by the Biometric Research Branch ofthe National Cancer Institute, for both unsupervised and supervised analyses. The Class Comparison Tool based on univariate F-tests may be used to find genes differentially expressed between predefined clinical groups at a significance level of E <0.001 or 0.002. The permutation distribution ofthe F-statistic, based on 2000 random permutations may also used to confirm statistical significance. The multi-variate Compound Covariate Predictor (CCP) Tool with a "leave-one-out" cross-validation test using 2000 random permutations at a significant level of EO.001 may be used to classify predefined clinical groups based on their gene expression profiles. In each cross-validation step one sample is omitted and a multivariate CCP is created based on the genes that are univariately significant at the specified level in the training set consisting ofthe samples not omitted. This CCP is used to classify the omitted sample and it is then noted whether the classification is correct or incorrect. This is repeated with all samples excluded one at a time. The total cross- validated misclassification rate is thereby determined. The statistical significance ofthe cross- validated misclassification rate is determined by repeating the entire cross-validation procedure to data with the class membership labels randomly permuted 2000 times. The CCP is based on a weighted linear combination of gene expression variables that are univariately significant in the training set with the weights being the corresponding t-statistics as described in Radmacher et al., Journal of Computational Biology, in press, 2002. An example of a clustering "tree" output is shown in Figures 1 and 3 (see, also, Example 1, infra).
[0094] Gene expression profiles may be defined based on the many smaller branches in the tree, or a small number of larger branches by cutting across the tree at different levels. The choice of cut level may be made to match the number of distinct clinical groups expected. If little or no prior information is available about the number of groups, then the tree should be divided into as many branches as are truly distinct. 'Truly distinct" may be defined by a minimum distance value between the individual branches. This distance is the vertical coordinate ofthe horizontal connector joining two branches (see Figure IB). Typical values are in the range 0.2 to 0.4 where 0 is perfect correlation and 1 is zero correlation, but may be larger for poorer quality data or fewer experiments in the training set, or smaller in the case of better data and more experiments in the training set. [0095] Preferably, "truly distinct" may be defined with an objective test of statistical significance for each bifurcation in the tree. In one aspect ofthe invention, the Compound Covariat Predictor (CCP) tool with "leave one out" cross-validation test using 2000 random permutations at a predefined significant level is used to define an objective test. The distribution of fractional improvements obtained from the CCP procedure is an estimate of the distribution under the null hypothesis that a particular classification is correct or incorrect.
[0096] Another aspect ofthe cluster analysis method of this invention provides the definition of basis vectors for use in profile projection described in the following sections.
B. Profile Comparison and Classification [0097] One aspect of the invention provides methods for drug discovery. In one embodiment, gene expression profiles are defined using cluster analysis. The genes within a gene expression profile are indicated as potentially co-regulated under the conditions of interest. Co-regulated genes are further explored as potentially being involved in a regulatory pathway. Identification of genes involved in a regulatory pathway provides useful information for designing and screening new drugs.
[0098] In some embodiments of the invention, drug candidates are screened for their therapeutic activity. In one embodiment, desired drug activity is to affect one particular genetic regulatory pathway. In this embodiment, drug candidates are screened for their ability to affect the gene expression profile corresponding to the regulatory pathway. In another embodiment, a new drug is desired to replace an existing drug. In this embodiment, the projected profiles of drug candidates are compared with that ofthe existing drug to determine which drug candidate has activities similar to the existing drug.
[0099] In some embodiments, the method ofthe invention is used to decipher pathway arborization and kinetics. When a receptor is triggered (or blocked) by a ligand, the excitation ofthe downstream pathways can be different depending on the exact temporal profile and molecular domains ofthe ligand interaction with the receptor. Simple examples ofthe differing effects of different ligands are the phenotypical differences that arise between responses to agonists, partial agonists, negative antagonists, and antagonists, and that are expected to occur in response to covalent vs. noncovalent binding and activation of different molecular domains on the receptor. See, Ross, Pharmacodynamics: Mechanisms of Drug
Action and the Relationship between Drug Concentration and Effect in The Pharmacological Basis of Therapeutics (Gilman et al. ed., McGraw Hill, New York, 1996) FIG. 4A illustrates two different possible responses of a pathway cascade.
[0100] In some embodiments ofthe invention, receptors for ligands such as OPN may be investigated using the projection method ofthe invention to simplify the observed temporal responses to receptor/ligand interactions over the responding genes. In some particularly preferred embodiments, the gene expression profiles and temporal profiles involved are discovered. The profile of temporal responses of a large number of genes are projected onto the predefined gene expression profiles to obtain a projected profile of temporal responses. The projection process simplifies the observed responses so that different temporal responses may be detected and discriminated more accurately.
C. Illustrative Diagnostic Applications
[0101] One aspect ofthe invention provides methods for diagnosing diseases of humans, animals and plants. Those methods are also useful for monitoring the progression of diseases and the effectiveness of treatments. [0102] In one embodiment ofthe invention, a patient cell sample such as a biopsy from a patient's diseased tissue such as metastatic HCC, is assayed for the expression of a large number of genes. The gene expression profile is projected into a profile of gene expression profile expression values according to a definition of gene expression profiles. The projected profile is then compared with a reference database containing reference projected profiles. If the projected profile ofthe patient matches best with a cancer profile in the database, the patient's diseased tissue is diagnosed as being cancerous. Similarly, when the best match is to a profile of another disease or disorder, a diagnosis of such other disease or disorder is made.
[0103] In another embodiment, a tissue sample is obtained from a patient's tumor. The tissue sample is assayed for the expression of a large number of genes of interest. The gene expression profile is projected into a profile of gene expression profile expression values according to a definition of gene expression profiles. The projected profile is compared with projected profiles previously obtained from the same tumor to identify the change of expression in gene expression profiles. A reference library is used to determine whether the gene expression profile changes indicate tumor progression such as metastasis. A similar method is used to stage other diseases and disorders. Changes of gene expression profile expression values in a profile obtained from a patient under treatment can be used to monitor the effectiveness ofthe treatment, for example, by comparing the projected profile prior to treatment with that after treatment.
D. Analytic Kit Implementation
[0104] In a prefeπed embodiment, the methods of this invention can be implemented by use of kits for determining the responses or state of a biological sample. Such kits contain microarrays, such as those described in subsections below. The microarrays contained in such kits comprise a solid phase, e.g., a surface, to which probes are hybridized or bound at a known location ofthe solid phase. Preferably, these probes consist of nucleic acids of known, different sequence, with each nucleic acid being capable of hybridizing to an RNA species or to a cDNA species derived therefrom. In particular, the probes contained in the kits of this invention are nucleic acids capable of hybridizing specifically to nucleic acid sequences derived from RNA species which are known to increase or decrease in response to perturbations to the particular protein whose activity is determined by the kit. The probes contained in the kits of this invention preferably substantially exclude nucleic acids which hybridize to RNA species that are not increased in response to perturbations to the particular protein whose activity is determined by the kit, such as osteopontin.
[0105] In a preferred embodiment, a kit ofthe invention also contains a database of gene expression profile definitions such as the databases described above or an access authorization to use the database described above from a remote networked computer. [0106] In another prefeπed embodiment, a kit ofthe invention further contains expression profile projection and analysis software capable of being loaded into the memory of a computer system such as the one described supra in the subsection, and illustrated in Example 1. The expression profile analysis software contained in the kit of this invention, is essentially identical to the expression profile analysis software described above in Example 1. [0107] Alternative kits for implementing the analytic methods of this invention will be apparent to one of skill in the art and are intended to be comprehended within the accompanying claims. In particular, the accompanying claims are intended to include the alternative program structures for implementing the methods of this invention that will be readily apparent to one of skill in the art. E. Methods for Determining Biological Response Profiles [0108] This invention utilizes the ability to measure the responses of a biological system to a large variety of perturbations. This section provides some exemplary methods for measuring biological responses. One of skill in the art would appreciate that this invention is not limited to the following specific methods for measuring the responses of a biological system.
1. Transcript Assay Using DNA Array
[0109] This invention is particularly useful for the analysis of gene expression profiles. One aspect ofthe invention provides methods for defining co-regulated gene expression profiles based upon the coπelation of gene expression. Some embodiments of this invention are based on measuring the transcriptional rate of genes.
[0110] The transcriptional rate can be measured by techniques of hybridization to aπays of nucleic acid or nucleic acid mimic probes, described in the next section, or by other gene expression technologies, such as those described in the subsequent subsection. However measured, the result is either the absolute, relative amounts of transcripts or response data including values representing RNA abundance ratios, which usually reflect DNA expression ratios (in the absence of differences in RNA degradation rates).
[0111] In various alternative embodiments ofthe present invention, aspects ofthe biological state other than the transcriptional state, such as the translational state, the activity state, or mixed aspects can be measured.
[0112] Preferably, measurement ofthe transcriptional state is made by hybridization to DNA microaπays, which are described in this section. Certain other methods of transcriptional state measurement are described later in this subsection.
[0113] In a prefeπed embodiment the present invention makes use of DNA microaπays. DNA microaπays can be employed for analyzing the transcriptional state in a biological sample and especially for measuring the transcriptional states of a biological sample exposed to graded levels of a drug of interest or to graded perturbations to a biological pathway of interest.
[0114] In one embodiment, DNA microaπays are produced by hybridizing detectably labeled polynucleotides representing the mRNA transcripts present in a cell (e.g., fluorescently labeled cDNA synthesized from total cell mRNA) to a microaπay. A microaπay is a surface with an ordered array of binding (e.g., hybridization) sites for products of many ofthe genes in the genome of a cell or organism, preferably most or almost all ofthe genes. Microaπays can be made in a number of ways, of which several are described below. However produced microaπays share certain prefeπed characteristics: The aπays are reproducible, allowing multiple copies of a given aπay to be produced and easily compared with each other. Preferably the microaπays are small, usually smaller than 52 cm, and they are made from materials that are stable under binding (e.g., nucleic acid hybridization) conditions. A given binding site or unique set of binding sites in the microaπay will specifically bind the product of a single gene in the cell. Although there may be more than one physical binding site (hereinafter "site") per specific mRNA, for the sake of clarity the discussion below will assume that there is a single site.
[0115] It will be appreciated that when cDNA complementary to the RNA of a cell is made and hybridized to a microaπay under suitable hybridization conditions, the level of hybridization to the site in the aπay corresponding to any particular gene will reflect the prevalence in the cell of mRNA transcribed from that gene. For example, when detectably labeled (e.g., with a fluorophore) cDNA complementary to the total cellular mRNA is hybridized to a microarray, the site on the aπay coπesponding to a gene (i.e., capable of specifically binding the product ofthe gene) that is not transcribed in the cell will have little or no signal (e.g., fluorescent signal), and a gene for which the encoded mRNA is prevalent will have a relatively strong signal.
[0116] In prefeπed embodiments, cDNAs from two different cells are hybridized to the binding sites ofthe microarray. In the case of drug responses one biological sample is exposed to a drug and another biological sample ofthe same type is not exposed to the drug. In the case of pathway responses one cell is exposed to a pathway perturbation and another cell ofthe same type is not exposed to the pathway perturbation. The cDNA derived from each ofthe two cell types are differently labeled so that they can be distinguished. In one embodiment, for example, cDNA from a cell treated with a drug (or exposed to a pathway perturbation) is synthesized using a fluorescein-labeled dNTP, and cDNA from a second cell, not drug-exposed, is synthesized using a rhodamine-labeled dNTP. When the two cDNAs are mixed and hybridized to the microaπay, the relative intensity of signal from each cDNA set is determined for each site on the array, and any relative difference in abundance of a particular mRNA detected. [0117] In the example described above, the cDNA from the drug-treated (or pathway perturbed) cell will fluoresce green when the fluorophore is stimulated and the cDNA from the untreated cell will fluoresce red. As a result, when the drug treatment has no effect, either directly or indirectly, on the relative abundance of a particular mRNA in a cell, the mRNA will be equally prevalent in both cells and, upon reverse transcription, red-labeled and green- labeled cDNA will be equally prevalent. When hybridized to the microaπay, the binding site(s) for that species of RNA will emit wavelengths characteristic of both fluorophores (and appear brown in combination). In contrast, when the drug-exposed cell is treated with a drug that, directly or indirectly, increases the prevalence ofthe mRNA in the cell, the ratio of green to red fluorescence will increase. When the drug decrease the mRNA prevalence, the ratio will decrease.
[0118] The use of a two-color fluorescence labeling and detection scheme to define alterations in gene expression has been described in, e.g., Shena et al., "Quantitative monitoring of gene expression patterns with a complementary DNA microarray," Science 270:467-470, 1995, which is incorporated by reference in its entirety for all purposes. An advantage of using cDNA labeled with two different fluorophores is that a direct and internally controlled comparison ofthe mRNA levels coπesponding to each aπayed gene in two cell states can be made, and variations due to minor differences in experimental conditions (e.g., hybridization conditions) will not affect subsequent analyses. However, it will be recognized that it is also possible to use cDNA from a single cell, and compare, for example, the absolute amount of a particular mRNA in, e.g., a drug-treated or pathway- perturbed cell and an untreated cell.
2. Preparation of Microarrays
[0119] Microaπays are known in the art and consist of a surface to which probes that coπespond in sequence to gene products (e.g., cDNAs, mRNAs, cRNAs, polypeptides, and fragments thereof), can be specifically hybridized or bound at a known position. In one embodiment, the microarray is an aπay (i.e., a matrix) in which each position represents a discrete binding site for a product encoded by a gene (e.g., a protein or RNA), and in which binding sites are present for products of most or almost all ofthe genes in the organism's genome. In a prefeπed embodiment, the "binding site" (hereinafter, "site") is a nucleic acid or nucleic acid analogue to which a particular cognate cDNA can specifically hybridize. The nucleic acid or analogue ofthe binding site can be, e.g., a synthetic oligomer, a full-length cDNA, a less-than full length cDNA, or a gene fragment.
[0120] Although in a preferred embodiment the microaπay contains binding sites for products of all or almost all genes in the target organism's genome, such comprehensiveness is not necessarily required. Usually the microaπay will have binding sites coπesponding to at least about 50% ofthe genes in the genome, often at least about 75%, more often at least about 85%, even more often more than about 90%, and most often at least about 99%. Preferably, the microarray has binding sites for genes relevant to the action of a drug of interest or in a biological pathway of interest. A "gene" is identified as an open reading frame (ORF) of preferably at least 50, 75, or 99 amino acids from which a messenger RNA is transcribed in the organism (e.g., if a single cell) or in some cell in a multicellular organism. The number of genes in a genome can be estimated from the number of mRNAs expressed by the organism, or by extrapolation from, a well-characterized portion ofthe genome. When the genome ofthe organism of interest has been sequenced, the number of ORFs can be determined and mRNA coding regions identified by analysis ofthe DNA sequence. For example, the Saccharomyces cerevisiae genome has been completely sequenced and is reported to have approximately 6275 open reading frames (ORFs) longer than 99 amino acids. Analysis of these ORFs indicates that there are 5885 ORFs that are likely to specify protein products (Goffeau et al., 1996, Life with 6000 genes, Science 274:546-567, which is incorporated by reference in its entirety for all purposes). In contrast, the human genome is estimated to contain approximately 5xl04 genes.
3. Preparing Nucleic Acids for Microarrays
[0121] As noted above, the "binding site" to which a particular cognate cDNA specifically hybridizes is usually a nucleic acid or nucleic acid analogue attached at that binding site. In one embodiment, the binding sites ofthe microaπay are DNA polynucleotides coπesponding to at least a portion of each gene in an organism's genome. These DNAs can be obtained by, e.g., polymerase chain reaction (PCR) amplification of gene segments from genomic DNA, cDNA (e.g., by RT-PCR), or cloned sequences. PCR primers are chosen, based on the known sequence ofthe genes or cDNA, that result in amplification of unique fragments (i.e., fragments that do not share more than 10 bases of contiguous identical sequence with any other fragment on the microaπay). Computer programs are useful in the design of primers with the required specificity and optimal amplification properties. See, e.g., Oligo version 5.0 (National Biosciences). In the case of binding sites coπesponding to very long genes, it will sometimes be desirable to amplify segments near the 3' end ofthe gene so that when oligo-dT primed cDNA probes are hybridized to the microaπay, less-than-full length probes will bind efficiently. Typically each gene fragment on the microaπay will be between about 50 bp and about 2000 bp, more typically between about 100 bp and about 1000 bp, and usually between about 300 bp and about 800 bp in length. PCR methods are well known and are described, for example, in Innis et al. eds., 1990, PCR Protocols: A Guide to Methods and Applications, Academic Press Inc., San Diego, Calif, which is incorporated by reference in its entirety for all purposes. It will be apparent that computer controlled robotic systems are useful for isolating and amplifying nucleic acids.
[0122] An alternative means for generating the nucleic acid for the microaπay is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N-phosphonate or phosphoramidite chemistries (Froehler et al., 1986, Nucleic Acid, Res 14:5399-5407; McBride et al., 1983, Tetrahedron Lett. 24:245-248). Synthetic sequences are between about 15 and about 500 bases in length, more typically between about 20 and about 50 bases. In some embodiments, synthetic nucleic acids include non-natural bases, e.g., inosine. As noted above, nucleic acid analogues may be used as binding sites for hybridization. An example of a suitable nucleic acid analogue is peptide nucleic acid (see, e.g., Egholm et al., 1993, PNA hybridizes to complementary oligonucleotides obeying the Watson-Crick hydrogen-bonding rules, Nature 365:566-568; see also U.S. Pat. No. 5,539,083).
[0123] In an alternative embodiment, the binding (hybridization) sites are made from plasmid or phage clones of genes, cDNAs (e.g., expressed sequence tags), or inserts therefrom (Nguyen et al., 1995, Differential gene expression in the murine thymus assayed by quantitative hybridization of aπayed cDNA clones, Genomics 29:207-209). In yet another embodiment, the polynucleotide ofthe binding sites is RNA.
4. Attaching Nucleic Acids to the Solid Surface
[0124] The nucleic acid or analogue are attached to a solid support, which may be made from glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, or other materials. A prefeπed method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al., 1995, Quantitative monitoring of gene expression patterns with a complementary DNA microaπay, Science 270:467-470. This method is especially useful for preparing microaπays of cDNA. See also DeRisi et al., 1996, Use of a cDNA microaπay to analyze gene expression patterns in human cancer, Nature Genetics 14:457-460; Shalon et al., 1996, A DNA microaπay system for analyzing complex DNA samples using two-color fluorescent probe hybridization, Genome Res. 6:639-645; and Schena et al., 1995, Parallel human genome analysis; microarray-based expression of 1000 genes, Proc. Natl. Acad. Sci. USA 93:10539-11286.
[0125] A second preferred method for making microaπays is by making high-density oligonucleotide arrays. Techniques are known for producing aπays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface using photolithographic techniques for synthesis in situ (see, Fodor et al., 1991, Light-directed spatially addressable parallel chemical synthesis, Science 251 :767-773; Pease et al., 1994,
Light-directed oligonucleotide aπays for rapid DNA sequence analysis, Proc. Natl. Acad. Sci. USA 91:5022-5026; Lockhart et al., 1996, Expression monitoring by hybridization to high- density oligonucleotide aπays, Nature Biotech 14:1675; U.S. Pat. Nos. 5,578,832; 5,556,752; and 5.510,270, each of which is incorporated by reference in its entirety for all purposes) or other methods for rapid synthesis and deposition of defined oligonucleotides (Blanchard et al., 1996, High-Density, Oligonucleotide aπays, Biosensors & Bioelectronics 11: 687-90). When these methods are used, oligonucleotides (e.g., 20-mers) of known sequence are synthesized directly on a surface such as a derivatized glass slide. Usually, the aπay produced contains multiple probes against each target transcript. Oligonucleotide probes can be chosen to detect alternatively spliced mRNAs or to serve as various type of control.
[0126] Another preferred method of making microaπays is by use of an inkjet printing process to synthesize oligonucleotides directly on a solid phase.
[0127] Other methods for making microaπays, e.g., by masking (Maskos and Southern, 1992, Nuc. Acids Res. 20:1679-1684), may also be used. In principal, any type of array, for example, dot blots on a nylon hybridization membrane (see Sambrook and Russell,
Molecular Cloning: A Laboratory Manual 3d ed, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 2001), could be used, although, as will be recognized by those of skill in the art, very small aπays will be prefeπed because hybridization volumes will be smaller.
5. Generating Labeled Probes [0128] Methods for preparing total and poly(A)+ RNA are well known and are described generally in Sambrook et al., supra. In one embodiment, RNA is extracted from biological samples ofthe various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al., 1979, Biochemistry 18:5294-5299). Alternatively, total RNA may be extracted from samples using TRIzol reagent (Life Technologies) according to manufacturer's directions. Poly(A)+ RNA is selected by selection with oligo-dT cellulose (see Sambrook and Russell, supra). Biological samples of interest include normal liver samples, non-cancerous liver samples and samples from defined clinical specimens.
[0129] Labeled cDNA is prepared from mRNA by oligo dT-primed or random-primed reverse transcription, both of which are well known in the art (see, e.g., Klug and Berger, 1987, Methods Enzymol. 152:316-325). Reverse transcription may be carried out in the presence of a dNTP conjugated to a detectable label, most preferably a fluorescently labeled dNTP. Alternatively, isolated mRNA can be converted to labeled antisense RNA synthesized by in vitro transcription of double-stranded cDNA in the presence of labeled dNTPs (Lockhart et al., 1996, Expression monitoring by hybridization to high-density oligonucleotide aπays, Nature Biotech. 14:1675, which is incorporated by reference in its entirety for all purposes). In alternative embodiments, the cDNA or RNA probe can be synthesized in the absence of detectable label and may be labeled subsequently, e.g., by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo-cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent. [0130] When fluorescently-labeled probes are used, many suitable fluorophores are known, including fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others (see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, Calif). It will be appreciated that pairs of fluorophores are chosen that have distinct emission spectra so that they can be easily distinguished.
[0131] In another embodiment, a label other than a fluorescent label is used. For example, a radioactive label, or a pair of radioactive labels with distinct emission spectra, can be used (see Zhao et al., 1995, High density cDNA filter analysis: a novel approach for large-scale, quantitative analysis of gene expression, Gene 156:207; Pietu et al., 1996, Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA aπay, Genome Res. 6:492). However, because of scattering of radioactive particles, and the consequent requirement for widely spaced binding sites, use of radioisotopes is a less-prefeπed embodiment.
[0132] In one embodiment, labeled cDNA is synthesized by incubating a mixture containing 0.5 mM dGTP, dATP and dCTP plus 0.1 mM dTTP plus fluorescent deoxyribonucleotides (e.g., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)) with reverse transcriptase (e.g., SuperScript.TM.il, LTI Inc.) at 42°C for 60 minutes.
6. Hybridization to Microarrays
[0133] Nucleic acid hybridization and wash conditions are optimally chosen so that the probe "specifically binds" or "specifically hybridizes" to a specific array site, i.e., the probe hybridizes, duplexes or binds to a sequence aπay site with a complementary nucleic acid sequence but does not hybridize to a site with a non-complementary nucleic acid sequence. As used herein, one polynucleotide sequence is considered complementary to another when, if the shorter of the polynucleotides is less than or equal to 25 bases, there are no mismatches using standard base-pairing rules or, if the shorter ofthe polynucleotides is longer than 25 bases, there is no more than a 5% mismatch. Preferably, the polynucleotides are perfectly complementary (no mismatches). It can easily be demonstrated that specific hybridization conditions result in specific hybridization by caπying out a hybridization assay including negative controls (see, e.g., Shalon et al., supra, and Chee et al., supra). [0134] Optimal hybridization conditions will depend on the length (e.g., oligomer versus polynucleotide greater than 200 bases) and type (e.g., RNA, DNA, PNA) of labeled probe and immobilized polynucleotide or oligonucleotide. General parameters for specific (i.e., stringent) hybridization conditions for nucleic acids are described in Sambrook et al, supra, and in Ausubel et al., 1987, Cuπent Protocols in Molecular Biology, Greene Publishing and Wiley- Interscience, New York. When the cDNA microaπays of Schena et al. are used, typical hybridization conditions are hybridization in 5xSSC plus 0.2% SDS at 65°C. for 4 hours followed by washes at 25°C. in low stringency wash buffer (lxSSC plus 0.2% SDS) followed by 10 minutes at 25°C. in high stringency wash buffer (O.lxSSC plus 0.2% SDS) (Shena et al., 1996, Proc. Natl. Acad. Sci. USA, 93:10614). Useful hybridization conditions are also provided in, e.g., Tijessen, 1993, Hybridization With Nucleic Acid Probes, Elsevier Science Publishers B. V. and Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, Calif. 7. Signal Detection and Data Analysis
[0135] When fluorescently labeled probes are used, the fluorescence emissions at each site of a transcript aπay can be detected by scanning confocal laser microscopy. Preferably the fluorescent intensities are measured by the Axon GenePix 4000 scanner. In one embodiment, a separate scan, using the appropriate excitation line, is carried out for each ofthe two fluorophores used. Alternatively, a laser can be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (see Shalon et al., 1996, A DNA microaπay system for analyzing complex DNA samples using two-color fluorescent probe hybridization, Genome Research 6:639-645, which is incorporated by reference in its entirety for all purposes). In a prefeπed embodiment, the aπays are scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective. Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser and the emitted light is split by wavelength and detected with two photomultiplier tubes. Fluorescence laser scanning devices are described in Schena et al., 1996, Genome Res. 6:639-645 and in other references cited herein. Alternatively, the fiber-optic bundle described by Ferguson et al., 1996, Nature Biotech. 14:1681-1684, may be used to monitor mRNA abundance levels at a large number of sites simultaneously.
[0136] Signals are recorded and, in a prefeπed embodiment, analyzed by computer, e.g., using a 12 bit analog to digital board. In one embodiment the scanned image is despeckled using a graphics program (e.g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet ofthe average hybridization at each wavelength at each site. If necessary, an experimentally determined coπection for "cross talk" (or overlap) between the channels for the two fluors may be made. In a preferred embodiment, the fluorescent intensities were analyzed by the GenePix Pro 3.0 software to subtract the background signals. The expression data were then filtered based on their channel intensities, spots size and flag (missing data) , and the Cy5/Cy3 ratios were calculated and normalized by median-centering the log-ratio of all genes in each aπay. For any particular hybridization site on the transcript aπay, a ratio ofthe emission ofthe two fluorophores can be calculated. The ratio is independent ofthe absolute expression level ofthe cognate gene, but is useful for genes whose expression is significantly modulated by drug administration, gene deletion, or any other tested event. [0137] According to the method of the invention, the relative abundance of an mRNA in two biological samples is scored as a perturbation and its magnitude determined (i.e., the abundance is different in the two sources of mRNA tested), or as not perturbed (i.e., the relative abundance is the same). In various embodiments, a difference between the two sources of RNA of at least a factor of about 25% (RNA from one source is 25% more abundant in one source than the other source), more usually about 50%, even more often by a factor of about 2 (twice as abundant), 3 (three times as abundant) or 5 (five times as abundant) is scored as a perturbation.
[0138] Preferably, in addition to identifying a perturbation as positive or negative, it is advantageous to determine the magnitude ofthe perturbation. This can be carried out, as noted above, by calculating the ratio of the emission ofthe two fluorophores used for differential labeling, or by analogous methods that will be readily apparent to those of skill in the art.
8. Pathway Response and Gene expression profiles [0139] In one embodiment ofthe present invention, gene expression profiles are determined by observing the gene expression profile of clinical sample of interest. In one embodiment ofthe invention, DNA microarrays reflecting the transcriptional state of a biological sample of interest are made by hybridizing a mixture of two differently labeled probes each corresponding (i.e., complementary) to the mRNA of a clinical sample of interest or a reference sample, to the microaπay. According to the present invention, the two samples are ofthe same type, i.e., ofthe same species and tissue type, but may differ in clinical diagnosis. The genes whose expression are highly coπelated may belong to a gene expression profile.
[0140] Further, it is preferable in order to reduce experimental eπor to reverse the fluorescent labels in two-color differential hybridization experiments to reduce biases peculiar to individual genes or aπay spot locations. In other words, it is preferable to first measure gene expression with one labeling (e.g., labeling perturbed cells with a first fluorochrome and unperturbed cells with a second fluorochrome) ofthe mRNA from the two cells being measured, and then to measure gene expression from the two cells with reversed labeling (e.g., labeling perturbed cells with the second fluorochrome and unperturbed cells with the first fluorochrome). Multiple measurements over exposure levels and perturbation control parameter levels provide additional experimental eπor control. With adequate sampling a trade-off may be made when choosing the width ofthe spline function S used to interpolate response data between averaging of eπors and loss of structure in the response functions.
9. Other Methods of Transcriptional State Measurement [0141] The transcriptional state of a cell may be measured by other gene expression technologies known in the art. Several such technologies produce pools of restriction fragments of limited complexity for electrophoretic analysis, such as methods combining double restriction enzyme digestion with phasing primers (see, e.g., European Patent O 534858 Al, filed Sep. 24, 1992, by Zabeau et al.), or methods selecting restriction fragments with sites closest to a defined mRNA end (see, e.g., Prashar et al., 1996, Proc. Natl. Acad. Sci. USA 93:659-663). Other methods statistically sample cDNA pools, such as by sequencing sufficient bases (e.g., 20-50 bases) in each of multiple cDNAs to identify each cDNA, or by sequencing short tags (e.g., 9-10 bases) which are generated at known positions relative to a defined mRNA end (see, e.g, Velculescu, 1995, Science 270:484-487).
10. Measurement of Other Aspects of Biological State
[0142] In various embodiments ofthe present invention, aspects ofthe biological state other than the transcriptional state, such as the translational state, the activity state, or mixed aspects can be measured in order to obtain drug and pathway responses. Details of these embodiments are described infra.
11. Embodiments Based on Translational State Measurements.
[0143] Measurement ofthe translational state may be performed according to several methods. For example, whole genome monitoring of protein (i.e., the "proteome," Goffeau et al., supra) can be carried out by constructing a microaπay in which binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of protein species encoded by the cell genome. Preferably, antibodies are present for a substantial fraction of the encoded proteins, or at least for those proteins relevant to the action of a drug of interest. Methods for making monoclonal antibodies are well known (see, e.g., Harlow and Lane, 1988, Antibodies: A Laboratory Manual, Cold Spring Harbor, N.Y. which is incorporated in its entirety for all purposes). In a prefeπed embodiment, monoclonal antibodies are raised against synthetic peptide fragments designed based on genomic sequence ofthe cell. With such an antibody aπay, proteins from the cell are contacted to the aπay and their binding is assayed with assays known in the art. [0144] Alternatively, proteins can be separated by two-dimensional gel electrophoresis systems. Two-dimensional gel electrophoresis is well-known in the art and typically involves iso-electric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et at., 1990, Gel Electrophoresis of Proteins: A Practical Approach, IRL Press, New York; Shevchenko et al., 1996, Proc. Nat'l Acad. Sci. USA
93:1440-1445; Sagliocco et al., 1996, Yeast 12:1519-1533; Lander, 1996, Science 274:536- 539. The resulting electropherograms can be analyzed by numerous techniques, including mass spectrometric techniques, western blotting and immunoblot analysis using polyclonal and monoclonal antibodies, and internal and N-terminal micro-sequencing. Using these techniques, it is possible to identify a substantial fraction of all the proteins produced under given physiological conditions, including in cells (e.g., in yeast) exposed to a drug, or in cells modified by, e.g., deletion or over-expression of a specific gene.
12. Embodiments Based on Other Aspects of the Biological State [0145] Even though methods of this invention are illustrated by embodiments involving gene expression profiles, the methods ofthe invention are applicable to any cellular constituent that can be monitored.
[0146] In particular, where activities of proteins relevant to the characterization of a perturbation, such as drug action, can be measured, embodiments of this invention can be based on such measurements. Activity measurements can be performed by any functional, biochemical, or physical means appropriate to the particular activity being characterized. Where the activity involves a chemical transformation, the cellular protein can be contacted with the natural substrate(s), and the rate of transformation measured. Where the activity involves association in multimeric units, for example association of an activated DNA binding complex with DNA, the amount of associated protein or secondary consequences of the association, such as amounts of mRNA transcribed, can be measured. Also, where only a functional activity is known, for example, as in cell cycle control, performance ofthe function can be observed. However known and measured, the changes in protein activities form the response data analyzed by the foregoing methods of this invention.
[0147] In alternative and non-limiting embodiments, response data may be formed of mixed aspects ofthe biological state of a cell. Response data can be constructed from, e.g., changes in certain mRNA abundances, changes in certain protein abundances, and changes in certain protein activities. II. Proteomic Analysis
[0148] In another aspect, the invention provides methods for detecting markers which are differentially present in the samples of a metastatic HCC tumor or tissue samples of patients predisposed for HCC (e.g., patients at high risk for developing HCC but where the tumor is undetectable). The markers can be detected in a number of biological samples. The sample is preferably a biological tissue sample lysate.
[0149] Any suitable methods can be used to detect one or more ofthe markers described herein. For example, gas phase ion spectrometry can be used. This technique includes, e.g., laser desorption/ionization mass spectrometry. Preferably, the sample is prepared prior to gas phase ion spectrometry, e.g., pre-fractionation, two-dimensional gel chromatography, high performance liquid chromatography, etc. to assist detection of markers. Detection of markers can be achieved using methods other than gas phase ion spectrometry. For example, immunoassays can be used to detect the markers in a sample. These detection methods are described in detail below.
A. Detection by Gas Phase Ion Spectrometry
[0150] Markers present in a biological sample can be detected using gas phase ion spectrometry, and preferably, mass spectrometry. In one embodiment, matrix-assisted laser desorption/ionization ("MALDI") mass spectrometry can be used. In another embodiment, surface-enhanced laser desorption/ionization mass spectrometry ("SELDI") can be used.
1. Preparation of a Sample Prior to Gas Phase Ion Spectrometry
[0151] One or combination of standard techniques well known in the art can be used to prepare a sample to further assist detection and characterization of markers in a sample. For example, a sample can be pre- fractionated to provide a less complex biological sample prior to gas phase ion spectrometry analysis using one or more ofthe following methods: size exclusion chromatography, Anion Exchange Chromatography, Affinity Chromatography, Sequential Extraction, Gel Electrophoresis, high performance liquid chromatography (HPLC).
[0152] Optionally, a marker can be modified before analysis to improve its resolution or to determine its identity. For example, the markers may be subject to proteolytic digestion before analysis. Fragments from a digestion by a suitable protease, such as trypsin, may function as a fingerprint for the markers, thereby enabling their detection indirectly. 2. Contacting a Sample with a Substrate for Gas Phase Ion Spectrometry Analysis
[0153] A biological sample can be contacted with a substrate, such as a spectrometer probe adapted for use with a gas phase ion spectrometer. Alternatively, a substrate can be a separate material that can be placed onto a spectrometer probe that is adapted for use with a gas phase ion spectrometer.
[0154] A spectrometer probe can be in any suitable shape as long as it is adapted for use with a gas phase ion spectrometer (e.g., removably insertable into a gas phase ion spectrometer). The spectrometer probe substrate can be made of any suitable material, solid or porous. Spectrometer probes suitable for use in embodiments ofthe invention are described in, e.g., U.S. Patent No. 5,617,060 (Hutchens and Yip) and WO 98/59360 (Hutchens and Yip).
[0155] If complexity of a sample has been substantially reduced as described above, the sample can be contacted with any suitable substrate for gas phase ion spectrometry. Prior to gas phase ions spectrometry analysis, an energy absorbing molecule ("EAM") or a matrix material is typically applied to markers on the substrate surface. The energy absorbing molecule and the sample containing markers can be contacted in any suitable manner.
[0156] Complexity of a sample can be further reduced using a substrate that comprises adsorbents capable of binding one or more markers. Adsorbents that bind the markers can be applied to the substrate in any suitable pattern (e.g., continuous or discontinuous), and a sample can be contacted with a substrate comprising an adsorbent in any suitable manner, e.g., bathing, soaking, dipping, spraying, washing over, or pipetting, etc. Following the contact, it is preferred that unbound materials on the substrate surface are washed out so that only the bound materials remain on the substrate surface.
3. Desorption/ionization and Detection
[0157] Markers on the substrate surface can be desorbed and ionized using gas phase ion spectrometry. Any suitable gas phase ion spectrometers can be used as long as it allows markers on the substrate to be resolved. Preferably, gas phase ion spectrometers allow quantitation of markers. In one embodiment, the gas phase ion spectrometer is a mass spectrometer, preferably a laser desoφtion time-of-flight mass spectrometer. In another embodiment, an ion mobility spectrometer can be used to detect markers. In yet another embodiment, a total ion cuπent measuring device can be used to detect and characterize markers.
4. Analysis of Data
[0158] Data generated by desoφtion and detection of markers can be analyzed using any suitable means. In one embodiment, data sets are analyzed with the use of a programmable digital computer. The computer program generally contains a readable medium that stores codes. Certain code can be devoted to memory that includes the location of each feature on a spectrometer probe, the identity ofthe adsorbent at that feature and the elution conditions used to wash the adsorbent. The computer also contains code that receives as input, data on the strength ofthe signal at various molecular masses received from a particular addressable location on the spectrometer probe. These data can indicate the number of markers detected, including the strength of the signal generated by each marker.
[0159] Data analysis can include the steps of determining signal strength (e.g., height of peaks) of a marker detected and removing "outerliers" (data deviating from a predetermined statistical distribution). The observed peaks can be normalized, a process whereby the height of each peak relative to some reference is calculated. For example, a reference can be background noise generated by instrument and chemicals (e.g. , energy absorbing molecule) which is set as zero in the scale. Then the signal strength detected for each marker or other biomolecules can be displayed in the form of relative intensities in the scale desired (e.g., 100). Alternatively, a standard (e.g., a serum protein) may be admitted with the sample so that a peak from the standard can be used as a reference to calculate relative intensities ofthe signals observed for each marker or other markers detected.
[0160] The computer can transform the resulting data into various formats for displaying. In one format, referred to as "spectrum view or retentate map," a standard spectral view can be displayed, wherein the view depicts the quantity of marker reaching the detector at each particular molecular weight. In another format, refeπed to as "peak map," only the peak height and mass information are retained from the spectrum view, yielding a cleaner image and enabling markers with nearly identical molecular weights to be more easily seen. In yet another format, refeπed to as "gel view," each mass from the peak view can be converted into a grayscale image based on the height of each peak, resulting in an appearance similar to bands on electrophoretic gels. In yet another format, refeπed to as "3-D overlays," several spectra can be overlaid to study subtle changes in relative peak heights. In yet another format, refeπed to as "difference map view," two or more spectra can be compared, conveniently highlighting unique markers and markers which are up- or down-regulated between samples. Marker profiles (spectra) from any two samples may be compared visually. In yet another format, Spotfire Scatter Plot can be used, wherein markers that are detected are plotted as a dot in a plot, wherein one axis ofthe plot represents the apparent molecular ofthe markers detected and another axis represents the signal intensity of markers detected. For each biological sample, markers that are detected and the amount of markers present in the biological sample can be saved in a computer readable medium. These data can then be compared to a control (e.g., a profile or quantity of markers detected in control, e.g. , patients in whom metastatic HCC or tissue samples of someone predisposed for HCC is undetectable).
[0161] A method for predicting the potential of developing metastasis in an HCC patient or developing HCC in a patient with chronic liver disease can be embodied by code that is executed by a digital computer capable of processing data sets derived from signals from arrays after contact with patient samples. The code can be executed by the digital computer to created an analytical model. The code may be stored on any suitable computer readable media. Examples of computer readable media include magnetic, electronic, or optical disks, tapes, sticks, chips, etc. The code may also be written in any suitable computer programming language including, visual basis, Fortran, C, C++, etc. The digital computer may be a micro, mini, or large frame computer using any standard or specialized operating system such as a Windows™ based operating system. A standard PC (personal computer) could be used to perform the analytical methods according to embodiments ofthe invention.
B. Detection by Immunoassay
[0162] An immunoassay can be used to detect and analyze markers in a sample. This method comprises: (a) providing an antibody that specifically binds to a marker; (b) contacting a sample with the antibody; and (c) detecting the presence of a complex ofthe antibody bound to the marker in the sample.
[0163] Methods for producing polyclonal and monoclonal antibodies that react specifically with a cellular marker are known to those of skill in the art. See, e.g., Coligan, Current Protocols in Immunology (1991); Harlow & Lane, Antibodies: A Laboratory Manual (1988); Goding, Monoclonal Antibodies: Principles and Practice (2d ed. 1986); and Kohler & Milstein, Nature 256:495-497 (1975). For example, to produce polyclonal antibodies, a purified target protein, is mixed with an adjuvant and used to immunize animals. When high titers of antibody to the target protein are obtained, blood is collected from the animals and antisera are prepared for immunoassays. To produce monoclonal antibodies, spleen cells from an animal immunized with a target protein are immortalized, commonly by fusion with a myeloma cell (see, Kohler and Milstein, Ewr. J. Immunol., 6:511-519, 1976). Colonies arising from single immortalized cells are screened for production of antibodies of the desired specificity and affinity for the target protein.
[0164] If the markers are not known proteins in the databases, nucleic acid and amino acid sequences can be determined with knowledge of even a portion ofthe amino acid sequence of the marker. For example, degenerate probes can be made based on the N-terminal amino acid sequence ofthe marker. These probes can then be used to screen a genomic or cDNA library created from a sample from which a marker was initially detected. The positive clones can be identified, amplified, and their recombinant DNA sequences can be subcloned using techniques which are well known. See, e.g., Ausubel et al, Current Protocols for Molecular Biology, 1994 and Sambrook and Russell, supra. Based on the polynucleotide sequence encoding a marker, antibodies against the marker can be prepared using any suitable methods known in the art. See, e.g., Huse et al, Science 246:1275-1281 (1989); Ward et al, Nature 341 : 544-546 (1989).
[0165] After the antibody is provided, a marker can be detected and/or quantified using any of suitable immunological binding assays known in the art (see, e.g., U.S. Patent Nos.
4,366,241; 4,376,110; 4,517,288; and 4,837,168). Useful assays include, for example, an enzyme immune assay (ΕIA) such as enzyme-linked immunosorbent assay (ΕLISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay. These methods are also described in, e.g., Methods in Cell Biology: Antibodies in Cell Biology, volume 37 (Asai, ed. 1993); Basic and Clinical Immunology (Stites & Ten, eds., 7th ed. 1991); and Harlow & Lane, supra.
C. Diagnosis of Metastatic HCC or the Predisposition to Develop HCC
[0166] In another aspect, the present invention provides methods for aiding a diagnosis of the probability of developing metastatic tumors in an HCC patient or a predispositon for developing HCC in a patient with a severe liver disease using one or more markers identified in Tables 2-7. Although valid diagnoses can be made based on as few as one marker selected from the markers in Tables 2-7, it is prefeπed that multiple markers are used to achieve more reliable results. Preferably, at least 10 cellular markers of Table 2 should be included in the set of markers used to predict an HCC patient's metastatic potential, for example, more preferably at least 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100, and most preferably all 153 markers of Table 2 should be included in the markers used. Similarly, preferably at least 10, more preferably at least 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100, and most preferably all 273 genes of Table 5 should be included in the markers used for determining the risk of developing HCC in a patient with a chronic liver disease. The markers identified in Tables 2- 7 can be used alone, in combination with other markers in any ofthe Tables, or with entirely different markers in aiding in the diagnosis of developing Metastatic HCC or a predisposition for developing HCC by a patient with a severe liver disease. The markers in Tables 2-7 are differentially present in samples of a Metastatic HCC or tissue samples of someone predisposed for HCC relative to a non-metastatic HCC or a subject not predisposed for HCC respectively. For example, some ofthe markers are expressed at an elevated level and/or are present at a higher frequency in metastatic HCC or tissue samples of someone predisposed for HCC relative to patients with non-metastatic HCC or individuals at low risk for developing HCC. Therefore, detection of one or more of these markers in a person would provide useful information regarding the probability that the person may develop Metastatic HCC or be predisposed to develop HCC.
[0167] Accordingly, embodiments ofthe invention include methods for aiding in diagnosing the probability of developing Metastatic HCC or in diagnosing the probability of a patient with a severe liver disease developing HCC, wherein the method comprises: (a) detecting at least one marker in a sample, wherein the marker is selected from the markers identified in Tables 2-7; and (b) coπelating the detection ofthe marker or markers with a diagnosis of metastatic HCC or the probability for a liver disease patient to develop HCC. The coπelation may take into account the amount ofthe marker or markers in the sample compared to a control amount ofthe marker or markers (e.g., a non-metastatic HCC or a subject not predisposed for HCC). The coπelation may take into account the presence or absence ofthe markers in a test sample and the frequency of detection ofthe same markers in a control. The coπelation may take into account both of such factors to facilitate determination of whether a subject has a metastatic HCC or has a sever liver disease that will likely lead to HCC. [0168] Any suitable samples can be obtained from a subject to detect markers. Preferably, a sample is a liver tissue sample from the subject. If desired, the sample can be prepared as described above to enhance detectability ofthe markers.
[0169] Any suitable method can be used to detect a marker or markers in a sample. For example, gas phase ion spectrometry or an immunoassay can be used as described above. Using these methods, one or more markers can be detected. Preferably, a sample is tested for the presence of a plurality of markers. Detecting the presence of a plurality of markers, rather than a single marker alone, would provide more information for the diagnostician. Specifically, the detection of a plurality of markers in a sample would increase the percentage of true positive and true negative diagnoses and would decrease the percentage of false positive or false negative diagnoses.
[0170] The detection of the marker or markers is then coπelated with a probable diagnosis of developing metastatic HCC or a predispositon for developing HCC by a patient with a severe liver disease. In some embodiments, the detection ofthe mere presence or absence of a marker, without quantifying the amount of marker, is useful and can be correlated with a probable diagnosis of developing metastatic HCC or a predispositon for developing HCC by a patient with a severe liver disease.
[0171] In other embodiments, the detection of markers can involve quantifying the markers to coπelate the detection of markers with a probable diagnosis of developing metastatic HCC or a predispositon for developing HCC by a patient with severe liver disease. For example, increased levels of OPN are observed in patients with metastatic HCC. Thus, if the amount ofthe markers detected in a subject being tested is higher compared to a control amount, then the subject being tested has a higher probability of developing metastatic HCC or a predispositon for developing HCC by a patient with a severe liver disease. [0172] When the markers are quantified, it can be compared to a control. A control can be, e.g., the average or median amount of marker present in comparable samples of normal subjects not predisposed to developing metastatic HCC or not predisposed to developing HCC by a patient with severe liver disease. The control amount is measured under the same or substantially similar experimental conditions as in measuring the test amount. For example, if a test sample is obtained from a subject's blood serum sample and a marker is detected using a particular probe, then a control amount ofthe marker is preferably determined from a serum sample of a patient using the same probe. It is prefeπed that the control amount of marker is determined based upon a significant number of samples from normal subjects who do not have metastatic HCC or tissue samples of someone not predisposed for HCC so that it reflects variations ofthe marker amounts in that population.
[0173] Data generated by mass spectrometry can then be analyzed by a computer software. The software can comprise code that converts signal from the mass spectrometer into computer readable form. The software also can include code that applies an algorithm to the analysis ofthe signal to determine whether the signal represents a "peak" in the signal corresponding to a marker of this invention, or other useful markers. The software also can include code that executes an algorithm that compares signal from a test sample to a typical signal characteristic of "normal" and metastatic HCC or a predispositon for developing HCC by a patient with severe liver disease and determines the closeness of fit between the two signals. The software also can include code indicating which the test sample is closest to, thereby providing a probable diagnosis.
III. Regulation of the Biological Activity of Therapeutic Targets [0174] Ostoepontin (OPN) and EpCAM have been positively coπelated to metastasis in an HCC patient and onset of HCC in a patient with a chronic liver disease, respectively. Therefore, it is one objective of this invention to identify compounds that regulate, particularly inhibit, the activity of OPN or EpCAM.
A. Assays for Biological Functions [0175] OPN and its alleles and polymoφhic variants are secreted phosphoproteins encoded by SEQ ID NO:l and whose amino acid sequence is disclosed in SEQ ED NO:2. The activity of OPN polypeptides can be assessed using a variety of in vitro and in vivo assays to determine its functional, chemical, and physical effects, e.g. , measuring receptor binding (e.g., radioactive receptor binding), and the like. Further downstream events, such as altered cellular events including cell proliferation, differentiation, etc. may also be used as indirect indicators of modified OPN activity. In addition, such assays can be used to test and screen for antagonists of OPN activity. Antagonists can also be genetically altered versions of OPN, e.g., a dominant negative version ofthe protein. Such antagonists of OPN activity are useful for treating metastatic HCC. [0176] The OPN ofthe assay will be selected from a polypeptide having a sequence of SEQ ED NO: 2 or a conservatively modified variant or fragment thereof. Generally, the amino acid sequence identity will be at least 70%, optionally at least 85%, optionally at least 90-95%. Optionally, the polypeptide ofthe assays will comprise a domain of OPN, such as a receptor binding domain, an extracellular matrix binding domain, and the like. Either OPN or a domain thereof can be covalently linked to a heterologous protein to create a chimeric protein used in the assays described herein. [0177] Modulators of OPN activity are tested using OPN polypeptides as described above, either recombinant or naturally occurring. The protein can be isolated, expressed in a cell, secreted from a cell, expressed in tissue or in an animal, either recombinant or naturally occurring. For example, liver slices, dissociated liver cells, or transformed cells can be used.. OPN antagonism is tested using one ofthe in vitro or in vivo assays described herein. Furthermore, receptor-binding domains ofthe OPN protein can be used in vitro in soluble or solid state reactions to assay for receptor binding.
[0178] Receptor binding to OPN, a domain, or chimeric protein can be tested in solution, in a bilayer membrane, attached to a solid phase, in a lipid monolayer, or in vesicles. Binding of an antagonist can be tested using, e.g., changes in spectroscopic characteristics (e.g., fluorescence, absorbance, refractive index) hydrodynamic (e.g., shape), chromatographic, or solubility properties.
[0179] Samples or assays that are treated with a potential OPN inhibitor are compared to control samples without the test compound, to examine the extent of antagonism. Control samples (untreated with inhibitors) are assigned a relative OPN activity value of 100. Antagonism of OPN is achieved when the OPN activity value relative to the control is about 90%, optionally 50%, optionally 25-0%.
[0180] Changes in OPN receptor binding may be assessed by determining changes in the ability ofthe vitronectin receptor to bind OPN in the presence ofthe antagonist. Generally, the compounds to be tested are present in the range from 1 pM to 100 mM. [0181] The effects ofthe test compounds upon the function ofthe polypeptides can be measured by examining any ofthe parameters described above. Any suitable physiological change that affects OPN activity can be used to assess the influence of a test compound on the polypeptides of this invention. When the functional consequences are determined using intact cells or animals, one can also measure a variety of effects such as transcriptional changes to both known and uncharacterized genetic markers (e.g., northern blots), changes in cell metabolism such as cell growth or pH changes. [0182] Similarly, the biological functions of EpCAM may be monitored based on the same general principles and methodologies as described above. For instance, EpCAM is known to play a role in epithelial cell homotypic adhesion, relying on both its extracellular and intracellular domains for proper functioning. Thus, EpCAM's functions can be examined based on, e.g., cell aggregation, specific interactions with its known binding partners (e.g., with actin via its intracellular domain), and disruption of signal transduction it is known to mediate. Various cellular events may serve as indicators of EpCAM activity and to facilitate screening test compounds for EpCAM antagonists.
B. Antagonists [0183] The compounds tested as antagonists of OPN or EpCAM can be any small chemical compound, or a biological entity, such as a protein, sugar, nucleic acid or lipid. Various antibodies against the proteins are likely candidates for antagonists. For example, many monoclonal antibodies, such as 17-1 A and GA733, are known to specifically bind EpCAM and can thus be tested in appropriate assays for their ability to interfere with EpCAM's biological functions.
[0184] Alternatively, antagonists can be genetically altered versions of OPN or EpCAM, such as a so-called "dominant negative" version, a biologically inactive version that suppresses the normal function of its wild type counteφart by competing for limited binding partners. Typically, test compounds will be small chemical molecules and peptides. Essentially any chemical compound can be used as a potential antagonist in the assays ofthe invention, although most often compounds can be dissolved in aqueous or organic (especially DMSO-based) solutions are used. The assays are designed to screen large chemical libraries by automating the assay steps and providing compounds from any convenient source to assays, which are typically run in parallel (e.g., in microtiter formats on microtiter plates in robotic assays). It will be appreciated that there are many suppliers of chemical compounds, including Sigma (St. Louis, MO), Aldrich (St. Louis, MO), Sigma-Aldrich (St. Louis, MO), Fluka Chemika-Biochemica Analytika (Buchs Switzerland) and the like.
[0185] In one prefeπed embodiment, high throughput screening methods involve providing a combinatorial chemical or peptide library containing a large number of potential therapeutic compounds (potential modulator or ligand compounds). Such "combinatorial chemical libraries" or "ligand libraries" are then screened in one or more assays, as described herein, to identify those library members (particular chemical species or subclasses) that display a desired characteristic activity. The compounds thus identified can serve as conventional "lead compounds" or can themselves be used as potential or actual therapeutics.
[0186] A combinatorial chemical library is a collection of diverse chemical compounds generated by either chemical synthesis or biological synthesis, by combining a number of chemical "building blocks" such as reagents. For example, a linear combinatorial chemical library such as a polypeptide library is formed by combining a set of chemical building blocks (amino acids) in every possible way for a given compound length (i.e., the number of amino acids in a polypeptide compound). Millions of chemical compounds can be synthesized through such combinatorial mixing of chemical building blocks. [0187] Preparation and screening of combinatorial chemical libraries is well known to those of skill in the art. Such combinatorial chemical libraries include, but are not limited to, peptide libraries (see, e.g., U.S. Patent 5,010,175; Furka, Int. J. Pept. Prot. Res. 37:487-493, 1991; and Houghton et al, Nature 354:84-88, 1991). Other chemistries for generating chemical diversity libraries can also be used. Such chemistries include, but are not limited to: peptoids (e.g., PCT Publication No. WO 91/19735), encoded peptides (e.g., PCT Publication WO 93/20242), random bio-oligomers (e.g., PCT Publication No. WO 92/00091), benzodiazepines (e.g., U.S. Pat. No. 5,288,514), diversomers such as hydantoins, benzodiazepines and dipeptides (Hobbs et al, Proc. Nat. Acad. Sci. USA 90:6909-6913, 1993), vinylogous polypeptides (Hagihara et al, J. Amer. Chem. Soc. 114:6568, 1992), nonpeptidal peptidomimetics with glucose scaffolding (Hirschmann et al. , J. Amer. Chem. Soc. 114:9217-9218, 1992), analogous organic syntheses of small compound libraries (Chen et al, J. Amer. Chem. Soc. 116:2661, 1994), oligocarbamates (Cho et al, Science 261:1303, 1993), and/or peptidyl phosphonates (Campbell et al, J. Org. Chem. 59:658, 1994), nucleic acid libraries (see Ausubel, Berger and Sambrook, all supra), peptide nucleic acid libraries (see, e.g., U.S. Patent 5,539,083), antibody libraries (see, e.g., Vaughn et al, Nature
Biotechnology, 14(3):309-314, 1996 and PCT/US96/10287), carbohydrate libraries (see, e.g., Liang et al, Science 274:1520-1522, 1996 and U.S. Patent 5,593,853), small organic molecule libraries (see, e.g., benzodiazepines, Baum C&EN, Jan 18, page 33, 1993; isoprenoids, U.S. Patent 5,569,588; thiazolidinones and metathiazanones, U.S. Patent 5,549,974; pyπolidines, U.S. Patents 5,525,735 and 5,519,134; moφholino compounds, U.S. Patent 5,506,337; benzodiazepines, 5,288,514, and the like). [0188] Devices for the preparation of combinatorial libraries are commercially available (see, e.g., 357 MPS, 390 MPS, Advanced Chem Tech, Louisville KY, Symphony, Rainin, Woburn, MA, 433A Applied Biosystems, Foster City, CA, 9050 Plus, Millipore, Bedford, MA). In addition, numerous combinatorial libraries are themselves commercially available (see, e.g., ComGenex, Princeton, N.J., Tripos, Inc., St. Louis, MO, 3D Pharmaceuticals, Exton, PA, Martek Biosciences, Columbia, MD, etc.).
C. Solid State and soluble high throughput assays [0189] In one embodiment the invention provide soluble assays using molecules such as a domain such as a receptor binding domain, an extracellular matrix binding domain, etc.; a domain that is covalently linked to a heterologous protein to create a chimeric molecule; OPN or EpCAM; or a cell or tissue expressing OPN or EpCAM, either naturally occurring or recombinant. In another embodiment, the invention provides solid phase based in vitro assays in a high throughput format, where the domain, chimeric molecule, OPN or EpCAM, or cell or tissue expressing OPN or EpCAM is attached to a solid phase substrate. [0190] In the high throughput assays ofthe invention, it is possible to screen up to several thousand different antagonists or ligands in a single day. In particular, each well of a microtiter plate can be used to run a separate assay against a selected potential modulator, or, if concentration or incubation time effects are to be observed, every 5-10 wells can test a single modulator. Thus, a single standard microtiter plate can assay about 100 (e.g., 96) modulators. If 1536 well plates are used, then a single plate can easily assay from about 100- about 1500 different compounds. It is possible to assay several different plates per day; assay screens for up to about 6,000-20,000 different compounds is possible using the integrated systems ofthe invention. More recently, microfluidic approaches to reagent manipulation have been developed, e.g., by Caliper Technologies (Palo Alto, CA). [0191] The molecule of interest can be bound to the solid state component, directly or indirectly, via covalent or non covalent linkage e.g., via a tag. The tag can be any of a variety of components. In general, a molecule which binds the tag (a tag binder) is fixed to a solid support, and the tagged molecule of interest (e.g., the signal transduction molecule of interest) is attached to the solid support by interaction ofthe tag and the tag binder. [0192] A number of tags and tag binders can be used, based upon known molecular interactions well described in the literature. For example, where a tag has a natural binder, for example, biotin, protein A, or protein G, it can be used in conjunction with appropriate tag binders (avidin, streptavidin, neutravidin, the Fc region of an immunoglobulin, etc.) Antibodies to molecules with natural binders such as biotin are also widely available and appropriate tag binders; see, SIGMA Immunochemicals 1998 catalogue SIGMA, St. Louis MO). [0193] Similarly, any haptenic or antigenic compound can be used in combination with an appropriate antibody to form a tag/tag binder pair. Thousands of specific antibodies are commercially available and many additional antibodies are described in the literature. For example, in one common configuration, the tag is a first antibody and the tag binder is a second antibody which recognizes the first antibody. In addition to antibody-antigen interactions, receptor-ligand interactions are also appropriate as tag and tag-binder pairs. For example, agonists and antagonists of cell membrane receptors (e.g., cell receptor-ligand interactions such as transferrin, c-kit, viral receptor ligands, cytokine receptors, chemokine receptors, interleukin receptors, immunoglobulin receptors and antibodies, the cadherein family, the integrin family, the selectin family, and the like; see, e.g., Pigott & Power, Ebe Adhesion Molecule Facts Book I (1993). Similarly, toxins and venoms, viral epitopes, hormones (e.g., opiates, steroids, etc.), intracellular receptors (e.g. which mediate the effects of various small ligands, including steroids, thyroid hormone, retinoids and vitamin D; peptides), drugs, lectins, sugars, nucleic acids (linear or cyclic polymer configurations), oligosaccharides, proteins, phospholipids, and antibodies can all interact with various cell receptors.
[0194] Synthetic polymers, such as polyurethanes, polyesters, polycarbonates, polyureas, polyamides, polyethyleneimines, polyarylene sulfides, polysiloxanes, polyimides, and polyacetates can also form an appropriate tag or tag binder. Many other tag/tag binder pairs are also useful in assay systems described herein, as would be apparent to one of skill upon review of this disclosure.
[0195] Common linkers such as peptides, polyethers, and the like can also serve as tags, and include polypeptide sequences, such as poly gly sequences of between about 5 and 200 amino acids. Such flexible linkers are known to persons of skill in the art. For example, poly(ethelyne glycol) linkers are available from Shearwater Polymers, Inc. Huntsville, Alabama. These linkers optionally have amide linkages, sulfhydryl linkages, or heterofunctional linkages. [0196] Tag binders are fixed to solid substrates using any of a variety of methods cuπently available. Solid substrates are commonly derivatized or functionalized by exposing all or a portion ofthe substrate to a chemical reagent which fixes a chemical group to the surface which is reactive with a portion ofthe tag binder. For example, groups which are suitable for attachment to a longer chain portion would include amines, hydroxyl, thiol, and carboxyl groups. Aminoalkylsilanes and hydroxyalkylsilanes can be used to functionalize a variety of surfaces, such as glass surfaces. The construction of such solid phase biopolymer aπays is well described in the literature. See, e.g., Meπifield, J. Am. Chem. Soc. 85:2149-2154 (1963) (describing solid phase synthesis of, e.g., peptides); Geysen et al, J. Immun. Meth. 102:259- 274 (1987) (describing synthesis of solid phase components on pins); Frank & Doling, Tetrahedron 44:60316040 (1988) (describing synthesis of various peptide sequences on cellulose disks); Fodor et al, Science, 251 :767-777 (1991); Sheldon et al, Clinical Chemistry 39(4):718-719 (1993); and Kozal et al, Nature Medicine 2(7):753759 (1996) (all describing aπays of biopolymers fixed to solid substrates). Non-chemical approaches for fixing tag binders to substrates include other common methods, such as heat, cross-linking by UV radiation, and the like.
D. Computer-based assays
[0197] Yet another approach to screen for compounds that modulate OPN or EpCAM activity involves computer assisted drug design, in which a computer system is used to generate a three-dimensional structure of OPN or EpCAM based on the structural information encoded by the amino acid sequence. The input amino acid sequence interacts directly and actively with a pre-established algorithm in a computer program to yield secondary, tertiary, and quaternary structural models ofthe protein. The models ofthe protein structure are then examined to identify regions ofthe structure that have the ability to bind, e.g., ligands. These regions are then used to identify ligands that bind to the protein.
[0198] The three-dimensional structural model ofthe protein is generated by entering protein amino acid sequences of at least 10 amino acid residues or coπesponding nucleic acid sequences encoding an OPN or EpCAM polypeptide into the computer system. For example, the amino acid sequence of an OPN polypeptide or the nucleic acid encoding the polypeptide is selected from the group consisting of SEQ ID NOS:l or 2, and conservatively modified versions thereof. The amino acid sequence represents the primary sequence or subsequence ofthe protein, which encodes the structural information ofthe protein. At least 10 residues of the amino acid sequence (or a nucleotide sequence encoding 10 amino acids) are entered into the computer system from computer keyboards, computer readable substrates that include, but are not limited to, electronic storage media (e.g. , magnetic diskettes, tapes, cartridges, and chips), optical media (e.g., CD ROM), information distributed by internet sites, and by RAM. The three-dimensional structural model ofthe protein is then generated by the interaction of the amino acid sequence and the computer system, using software known to those of skill in the art.
[0199] The amino acid sequence represents a primary structure that encodes the information necessary to form the secondary, tertiary and quaternary structure ofthe protein of interest. The software looks at certain parameters encoded by the primary sequence to generate the structural model. These parameters are referred to as "energy terms," and primarily include electrostatic potentials, hydrophobic potentials, solvent accessible surfaces, and hydrogen bonding. Secondary energy terms include van der Waals potentials. Biological molecules form the structures that minimize the energy terms in a cumulative fashion. The computer program is therefore using these terms encoded by the primary structure or amino acid sequence to create the secondary structural model.
[0200] The tertiary structure ofthe protein encoded by the secondary structure is then formed on the basis ofthe energy terms ofthe secondary structure. The user at this point can enter additional variables such as whether the protein is membrane bound or soluble, its location in the body, and its cellular location, e.g. , cytoplasmic, surface, or nuclear. These variables along with the energy terms ofthe secondary structure are used to form the model of the tertiary structure. In modeling the tertiary structure, the computer program matches hydrophobic faces of secondary structure with like, and hydrophilic faces of secondary structure with like. [0201] Once the structure has been generated, potential ligand binding regions are identified by the computer system. Three-dimensional structures for potential ligands are generated by entering amino acid or nucleotide sequences or chemical formulas of compounds, as described above. The three-dimensional structure ofthe potential ligand is then compared to that of the OPN or EpCAM protein to identify ligands that bind to OPN or EpCAM. Binding affinity between the protein and ligands is determined using energy terms to determine which ligands have an enhanced probability of binding to the protein. [0202] Computer systems are also used to screen for mutations, polymoφhic variants, alleles and interspecies homologs of OPN genes or EpCAM genes. Such mutations can be associated with disease states or genetic traits. As described above, GENECHIP® and related technology can also be used to screen for mutations, polymoφhic variants, alleles, and interspecies homologs. Once the variants are identified, diagnostic assays can be used to identify patients having such mutated genes. Identification ofthe mutated OPN genes, for example, involves receiving input of a first amino acid or nucleic acid sequence encoding OPN, selected from the group consisting of SEQ ID NOS:l and 2, and conservatively modified versions thereof. The sequence is entered into the computer system as described above. The first nucleic acid or amino acid sequence is then compared to a second nucleic acid or amino acid sequence that has substantial identity to the first sequence. The second sequence is entered into the computer system in the manner described above. Once the first and second sequences are compared, nucleotide or amino acid differences between the sequences are identified. Such sequences can represent allelic differences in OPN genes, and mutations associated with disease states and genetic traits. The same general strategy is also applicable for detecting EpCAM variants and mutants.
E. Kits
[0203] A protein of interest and its homologs are a useful tool for identifying its antagonists. For instance, OPN-specific reagents that specifically hybridize to OPN nucleic acid, such as OPN probes and primers, and OPN specific reagents that specifically bind to the OPN protein, e.g., OPN antibodies are used to examine liver cell expression, signal transduction regulation and diagnose metastatic HCC. The same general methods are applicable to EpCAM as well.
[0204] Nucleic acid assays for the presence and the quantity of OPN or EpCAM polynucleotides in a sample include numerous techniques well known to those skilled in the art, such as Southern blot analysis, northern blot analysis, dot blots, RNase protection, SI analysis, amplification techniques such as PCR (including RT-PCR) and LCR, and in situ hybridization. In in situ hybridization, for example, the target nucleic acid, e.g., nucleic acid encoding OPN, is liberated from its cellular suπoundings in such as to be available for hybridization within the cell while preserving the cellular moφhology for subsequent inteφretation and analysis (see Example 1). The following articles provide an overview of the art of in situ hybridization: Singer et al, Biotechniques 4:230-250 (1986); Haase et al, Methods in Virology, vol. VII, pp. 189-226 (1984); and Nucleic Acid Hybridization: A Practical Approach (Hames et al, eds. 1987). In addition, OPN or EpCAM protein can be detected with the various immunoassay techniques described above. The test sample is typically compared to both a positive control (e.g., a sample containing recombinant OPN or EpCAM) and a negative control.
[0205] The present invention also provides for kits for screening for modulators of OPN or EpCAM. Such kits can be prepared from readily available materials and reagents. For example, such kits can comprise any one or more ofthe following materials: OPN (or EpCAM), reaction tubes, and instructions for testing OPN (or EpCAM) activity. Optionally, the kit contains biologically active OPN (or EpCAM). A wide variety of kits and components can be prepared according to the present invention, depending upon the intended user ofthe kit and the particular needs ofthe user.
IV. Inhibition of the Expression of Therapeutic Targets
[0206] Another means of inhibiting OPN activity and thereby inhibiting HCC metastasis in an HCC patient is to inhibit OPN expression. Similarly, reduced risk of developing HCC in a patient of a chronic liver disease may be achieved by inhibiting EpCAM expression. A variety of methods well known to those skilled in the art are available for specifically suppressing the expression of a particular gene.
A. Antisense polynucleotides [0207] Antisense technology has been the most commonly described approach in protocols to achieve gene-specific inactivation and are useful tools in research and diagnostics. For instance, antisense oligonucleotides capable of inhibiting gene expression with high level of specificity are often used by those of ordinary skill in biological sciences to elucidate the function of particular genes. [0208] The specificity and sensitivity of antisense polynucleotides also make them suitable for therapeutic uses. A large number of U.S. patents and scientific publications relate to the use of antisense oligonucleotides as therapeutic agents in the treatment of diseases in animals and humans. See, e.g., U.S. Patent Nos. 6,080,580; 6,180,403; 6,255,111; 6,306,655; 6,440,739; and 6,524,854. An antisense oligonucleotide contains a sequence complementary to the coding strand of a gene targeted for inactivation (e.g., SEQ ID NO: 1 or SEQ ID NO:5) and may be of varying lengths, e.g., from less than 10 nucleotides to more than 100 nucleotides, can be safely and effectively administered to a subject, e.g., a human. An antisense polynucleotide may be an oligomer or a polymer of ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) or mimetics thereof. It may be composed of naturally- occuπing nucleobases, sugars and covalent internucleoside (backbone) linkages as well as oligonucleotides having non-naturally-occurring portions that function similarly. Such modified or substituted antisense oligonucleotides are often preferred over native forms because of desirable properties such as, e.g., enhanced cellular uptake, enhanced affinity for nucleic acid target, and increased stability in the presence of nucleases. Antisense oligonucleotides suitable for the present invention may also include oligonucleotides containing modified backbones or non-natural internucleoside linkages. Prefeπed modified oligonucleotide backbones include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotri-esters, methyl and other alkyl phosphonates including 3 '-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3 '-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thiono-alkylphosphonates, thionoalkylphosphotriesters, and borano-phosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'. Various salts, mixed salts and free acid forms are also included.
[0209] Furthermore, antisense oligonucleotides suitable for the present invention may correspond to either the coding region or the non-coding region of a target nucleic acid, e.g., OPN or EpCAM.
B. Ribozymes
[0210] The level of mRNA encoded by a gene of interest, e.g., OPN or EpCAM, can also be reduced using ribozymes. Ribozymes are RNA molecules having an enzymatic activity that is capable of cleaving or splicing other separate RNA molecules in a nucleotide sequence specific manner. A ribozyme useful for practicing the present invention is a catalytic or enzymatic RNA molecule with complementarity in a substrate binding region to a specific RNA target, e.g. , OPN or EpCAM mRNA, and also has enzymatic activity that is active to cleave and/or splice RNA in that target, thereby inhibiting the expression ofthe target gene. Methods for designing and using ribozymes to target a particular gene are known to those of skill in the art and described in numerous publications, including U.S. Patent Nos. 6.069,007; 6,107,027; 6,225,291; 6,307,041; 6,482,803; and 6,489,163. C. Small inhibitory RNA (siRNA)
[0211] Another useful tool to reduce the level of a target mRNA and thus the level of a target protein is small inhibitory RNA (siRNA). siRNA molecules are small double-stranded RNA molecules that elicit a process known as RNA interference, a form of sequence-specific gene inactivation. A proposed mechanism for RNA interference hypothesizes an ATP- dependent cleavage of mRNA molecules activated by a short double-stranded RNA, which is formed between the mRNA and the antisense strand of siRNA. Zamore et al, Cell 101:25- 33, 2000. RNA interference has been shown in mammalian cell lines, oocytes, early embryos, and some cell types. See, e.g., Elbashir, Sayda M., et al, Nature 411:494-497, 2001. siRNA coding sequences can be designed based on the sequence of a target gene (e.g. , OPN or EpCAM) and inserted into various suitable vectors, such as a plasmid or a viral vector, with properly placed transcription initiation and termination elements. When used in an intended recipient of eukaryotic origin, eukaryotic transcription control elements should be used. The vectors containing siRNA coding sequences can then be delivered to a desired target in accordance with the general methodologies for gene transfer known to those of skill in the art. RNA interference thus provides an alternative means to specifically inhibit the expression of a gene based on its sequence, by causing the rapid degradation ofthe mRNA of the gene, e.g., OPN or EpCAM.
D. Detection of Reduced Target Gene Expression [0212] Following the administration of a therapeutic compound containing an agent capable of inhibiting the expression of a target gene, e.g., OPN or EpCAM, the effectiveness ofthe therapeutic compound can be assessed by comparing the in vivo level ofthe target gene before and after the administration. The general methods for administering a pharmaceutical compound are described in detail in a later section. [0213] When the inhibition of gene expression is achieved at transcriptional level, i.e., by reduction ofthe amount of mRNA encoding a target gene, the diminished expression ofthe target gene may be confirmed using various detection techniques such as Northern blot assays, dot blot, RT-PCR and the like by comparing the mRNA level ofthe target gene (e.g., OPN or EpCAM) before and after the administration of a therapeutic compound. The general methodologies for performing such analysis are well known to those of ordinary skill in the art and described in various literature (see, e.g., Sambrook and Russell, supra and Ausubel et al., supra). [0214] When the inhibition of gene expression is achieved at translational level, i.e., by reduction of the amount of protein encoded by a target gene, the diminished expression ofthe target gene may be confirmed by comparing the protein level ofthe target gene (e.g., OPN or EpCAM) before and after the administration of a therapeutic compound using various means of measuring protein levels in tissue samples are well known to the ordinarily skilled artisans. As mentioned above, various immunoassays are routinely used to detect the presence and quantity of a protein of interest, e.g., OPN or EpCAM. A general overview ofthe applicable technology can be found in Harlow and Lane, Antibodies, A Laboratory Manual, 1988.
[0215] Appropriate antibodies for target proteins, e.g., OPN and EpCAM, will be necessary for immunoassays. The general methods for preparing antibodies specific for a target protein are well known in the art and described in an earlier section. Further, some antibodies with desirable specificity may already be available for immunoassays (e.g., various mAb for EpCAM).
[0216] Once antibodies specific for a target protein, e.g., OPN or EpCAM, are available, the level the target protein in a patient can be measured by a variety of immunoassay methods with qualitative and quantitative results available to the clinician. Various samples from the patient, such as blood or liver tissue, can be used in the immunoassays to detected the in vivo target protein level according to the general methods described in an earlier section. For a review of immunological and immunoassay procedures in general see, e.g., Stites, supra; U.S. Patent Nos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168.
V. Administration of Agents Inhibiting Target Protein Activity and Pharmaceutical Compositions
[0217] Agents that inhibit the activity of a target protein, e.g. , OPN or EpCAM, can be administered directly to the human patient for modulation ofthe target protein activity in vivo. Administration is by any ofthe routes normally used for introducing an antagonist or inhibitor compound into ultimate contact with the tissue to be treated, optionally using the tongue or mouth. The antagonists or inhibitors are administered in any suitable manner, optionally with pharmaceutically acceptable earners. Suitable methods of administering such antagonists or inhibitors are available and well known to those of skill in the art, and, although more than one route can be used to administer a particular composition, a particular route can often provide a more immediate and more effective reaction than another route. [0218] Pharmaceutically acceptable carriers are determined in part by the particular composition being administered, as well as by the particular method used to administer the composition. Accordingly, there is a wide variety of suitable formulations of pharmaceutical th compositions ofthe present invention (see, e.g., Remington 's Pharmaceutical Sciences, 17 ed., 1985).
[0219] The antagonists or inhibitors, alone or in combination with other suitable components, can be made into aerosol formulations (i.e., they can be "nebulized") to be administered via inhalation. Aerosol formulations can be placed into pressurized acceptable propellants, such as dichlorodifluoromethane, propane, nitrogen, and the like. [0220] Formulations suitable for administration include aqueous and non-aqueous solutions, isotonic sterile solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that render the formulation isotonic, and aqueous and non-aqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives. In the practice of this invention, compositions can be administered, for example, by orally, topically, intravenously, intraperitoneally, intravesically or intrathecally. Optionally, the compositions are administered orally or nasally. The formulations of compounds can be presented in unit-dose or multi-dose sealed containers, such as ampules and vials. Solutions and suspensions can be prepared from sterile powders, granules, and tablets ofthe kind previously described. The modulators can also be administered as part a of prepared food or drug.
[0221] The dose administered to a patient, in the context of the present invention should be sufficient to effect a beneficial response in the subject over time. The dose will be determined by the efficacy ofthe particular signal modulators employed and the condition of the subject, as well as the body weight or surface area ofthe area to be treated. The size of the dose also will be determined by the existence, nature, and extent of any adverse side- effects that accompany the administration of a particular compound or vector in a particular subject.
[0222] In determining the effective amount of an antagonist or inhibitor to be administered in a physician may evaluate circulating plasma levels of the agent, its toxicities, and the production of antibodies against the agent. In general, the dose equivalent of an antagonist or inhibitor is from about 1 ng/kg to 10 mg/kg for a typical subject. [0223] For administration, antagonists or inhibitors of the present invention can be administered at a rate determined by the LD-50 of the antagonist, and the side-effects ofthe inhibitor at various concentrations, as applied to the mass and overall health ofthe subject. Administration can be accomplished via single or divided doses.
VI. Examples
[0224] It is understood that the examples and embodiments described herein are for illustrative puφoses only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope ofthe appended claims. All publications, patents, and patent applications cited herein are hereby incoφorated by reference in their entirety for all puφoses without limitation.
A. Example 1: Predicting a predisposition for Hepatocellular Carcinoma metastasis
1. MATERIALS AND METHODS a) Patients and tissue samples.
[0225] All of the HCC samples were obtained with informed consent from patients who underwent curative resection in Liver Cancer Institute, Zhongshan Hospital of Fudan University in China. A total of 107 paired primary HCC, metastatic HCC, and adjacent non- tumor normal liver tissue samples were obtained from 40 patients who were pathologically diagnosed as HCC and underwent hepatectomy at the Liver Cancer Institute, Zhongshan Hospital of Fudan University (formerly Shanghai Medical University) in China. Prior to surgery, each patient was examined by computer tomography of abdomen and chest X-ray, and some patients also were examined by isotope scanning of bone if necessary. Among the 107 paired samples, 81 were from 27 patients who had primary HCC, coπesponding adjacent non-tumor liver tissue and metastatic HCC [15 with intra-hepatic spreads (group P) and 12 with tumor thrombus in branch of portal vein (group PT)], and 26 were from 13 patients who had only a single primary HCC and coπesponding non-tumor liver tissue (without detectable metastasis at the time of surgery). Tumors and non-tumor tissues were grossly dissected, snap-frozen in liquid nitrogen immediately after removal, and stored at -70°C until use. We confirmed microscopically that tumor tissue samples and their metastases consisted mostly of carcinoma cells and that non-tumor adjacent liver samples did not exhibit any tumor cell invasion. Ofthe 40 patients, 39 were male, and one was female. Patients' age ranged from 36 years to 74 years, with a median age of 50 years. The size ofthe primary HCC ranged from 1.3 cm to 17.5 cm in diameter with a median diameter of 7.2 cm, of which 65% (26/40) were > 5 cm in diameter and remaining were <5 cm in diameter. Thirty-two cases (80%) had co-existing liver ciπhosis. Serologically, all ofthe 40 patients with an exception of one were HBV-positive, but no one was HCV-positive. Twenty-seven patients (68%) had an elevated serum concentration of alpha-fetoprotein (AFP) (>20 ng/ml).
b) RNA preparation, cDNA Microarrays and Hybridization.
[0226] Total RNA was extracted from each sample using TRIzol Reagent (Life Technologies, Inc.) according to the manufacturer's specification. The cDNA microaπays were fabricated at the Advanced Technology Center, NCI. Each array contains 9180 cDNA clones with 7102 "named" genes, 1179 EST clones, and 122 Incyte clones. Preparation of fluorescent cDNA targets by a direct labeling approach and the cDNA microaπay hybridization were essentially as described by Wu et al., Oncogene 20:3674-3682, 2001. Briefly, the fluorescent targets were prepared as following: 100 μg of total RNA from non- cancerous liver tissue were labeled with Cy3-conjugated deoxynucleotides or 200 μg of total RNA from primary HCC or metastasis were labeled with Cy5 -conjugated deoxynucleotides (Amersham) by the oligo dT-primed polymerization using Superscript II reverse transcriptase (Life Technologies). The targets were then mixed together and added to the microaπays, and then incubated overnight (12-16 hours) at 42°C. Prior to hybridization, each microaπay was pre-hybridized at 42°C for at least one hour in pre-hybridization buffer containing 5χ SSC, 0.1% SDS and 1% BSA. The slides were washed at room temperature in each with 2x SSC, 0.1% SDS and lx SSC and0.2x SSC for 2 min, respectively, and washed in 0.05x SSC for 1 min. Most of samples, when indicated, were done in duplication. The Cy3 and Cy5 fluorescent intensities for each clone were determined by the Axon GenePix 4000 scanner, and were analyzed by the GenePix Pro 3.0 software to subtract the background signals. The expression data were then filtered based on their channel intensities, spots size and flag, and the Cy5/Cy3 ratios were calculated and normalized by median-centering the log-ratio of all genes in each aπay.
c) Data Analysis and Statistical Analysis. [0227] Unsupervised hierarchical clustering analysis was done by the CLUSTER and
TREEVIEW software using median centered coπelation and complete linkage (Eisen et al., supra). We also used the BRB-AπayTools software, an integrated package for the visualization and statistical analysis of cDNA microarray gene expression data developed by the Biometric Research Branch ofthe National Cancer Institute, for both unsupervised and supervised analyses. The Class Comparison Tool based on univariate F-tests was used to find genes differentially expressed between predefined clinical groups at a significance level of E O.001 or 0.002. The permutation distribution ofthe F-statistic, based on 2000 random permutations was also used to confirm statistical significance. In comparing primary to metastatic tumors ofthe same patient, a paired value t-statistic was used in the same manner. The multi-variate Compound Covariate Predictor (CCP) Tool with a "leave-one-out" cross- validation test using 2000 random permutations at a significant level of EO.001 was used to classify predefined clinical groups based on their gene expression profiles. In each cross- validation step one sample is omitted and a multivariate CCP is created based on the genes that are univariately significant at the specified level in the training set consisting ofthe samples not omitted. This CCP is used to classify the omitted sample and it is then noted whether the classification is coπect or incoπect. This is repeated with all samples excluded one at a time. The total cross-validated misclassification rate is thereby determined. The statistical significance ofthe cross-validated misclassification rate is determined by repeating the entire cross-validation procedure to data with the class membership labels randomly permuted 2000 times. The CCP is based on a weighted linear combination of gene expression variables that are univariately significant in the training set with the weights being the coπesponding t-statistics as described in Radmacher et al., supra. When the CCP was used to classify paired primary and metastatic tissue, the cross-validation was performed with one pair at a time omitted and the classification based on the paired differences in expression for each gene. Averaged gene expression data from duplicated samples were included for the analysis. [0228] To generate a prediction model to classify HCC with metastasis potential, we randomly selected 10 PN samples and 10 PT samples as a training set. A total of 20-blinded new HCC samples were included as a testing set. The classification of new samples was based on the computation with the following linear combination: L = ∑j tj *(XJ - m,), where tj = t-value for gene i in the classifier, x; = log-ratio of gene i in the new sample to be classified, and mj = midpoint between PN and PT groups for gene i (see Table 2). Additional details are available in BRB-AπayTools Users Guide. The Kaplan-Meier Survival analysis was used to compare patient survival, using an Εxcel-based WinSTAT software. The statistical E value was generated by the Cox-Mantel log-rank test when PN was compared to P or PT. d) Semi-quantitative PT-PCR and Western blotting. [0229] Total RNA was reverse-transcribed with SUPERSCRIPT™ II RNase H" Reverse Transcriptase and Random hexamers (Invitrogen Inc.). PCR was done with 26 cycles (94°C, 30 sec; 53°C, 30 sec; 72°C, 1 min) followed by an extra cycle at 72°C for 10 min using the following primers: OPN sense 5 '-GACTCGAACGACTCTGATGATGTA-3 ' (SEQ ID
NO:3); OPN antisense 5'-CTGGGCAACGGGGATGG-3' (SEQ ID NO:4); and HotStarTaq Master Mix (QIAGEN). QuantumRNA™ 18S (Ambion) was used as an internal standard. Densitometry was used to quantify the amount of OPN, which was normalized by the 18S product. Western blot analysis was done essential as described by Wu et al., supra. Briefly, protein lysates from CCL13, SK-Hep-1 and Hep3B cells were prepared in REPA buffer (50 mM Tris-HCI, pH 7.4/150 mM NaCl/1% Triton X-100/1% deoxycholate/1.0% SDS/1% aprotinin), separated on 10% SDS-PAGE, transfeπed to an Immobilin-P membrane (Millipore, Bedford, MA), probed with a rat monoclonal anti-OPN antibody (Chemicon International), and visualized by the ECL-based assay (Amersham).
e) Cell lines and In vitro invasion assay.
[0230] Two human hepatoma derived cell lines with different metastatic potential, SK- Hep-1 and Hep3B, and one non-transformed liver cell line, CCL13 (Chang liver cells), were used to determine the functional association of OPN with metastatic potential using the BD
BioCoat Matrigel Invasion Chamber (BD Biosciences) according to the manufacture's instruction. These cells were obtained from American Type Culture Collection. Cells were routinely maintained at 37°C in a humidified atmosphere of 5% CO2 in EMEM (GEBCOL) medium supplemented with 10% fetal bovine serum, lx nonessential amino acids, lx sodium pyruvate, 2 mM glutamine and penicillin/streptomycin. For invasion analysis, cells were plated in the up chamber in serum-free EMEM, and incubated in the absence or presence of either recombinant murine OPN (2 μg/ml) (R&D Systems) or a well-documented neutralizing antibody against OPN (3 μg/ml) (R&D Systems) for 20 hours. The EMEM medium containing 5% FBS was added to the bottom chamber, serving as chemoattractants. The number of cells invading through the Matrigel™ membrane was calculated before and after adding OPN or antibody of OPN for each cell line.
f) Tissue histology analysis.
[0231] Paraffin-embedded tissue blocks were prepared and were subjected to serial sections with a thickness of 5 μm mounted on electrically charged glass slides. Slides were subjected to hematoxylin and eosin (H&E) staining. Two pathologists read these slides independently for the histological diagnosis. For immunohistochemistry analysis, slides were deparafinized and processed for immunostaining as described by Forgues et al., J Biol. Chem. 276:22797- 22803, 2001. Briefly, slides were incubated in microwave oven for 15 min in IX citrate buffer for antigen retrieval and then quenched with 3% hydrogen peroxide to block the endogenous peroxidase activity for 10 min. Following incubation with 10% donkey serum to block the non-specific binding, the sections were incubated over night at 4EC with a rat monoclonal anti-OPN antibody (Chemicon International). Biotinylated secondary antibodies and streptavidin peroxidase complex (ABC Elite kit, Vector Labs) were used. Chromogenic development was obtained by the immersion of sections in 3-3' di-aminobenzidine (DAB) solution (0.25 mg per ml with 3% hydrogen peroxide). The slides were counter-stained with Harris = Hematoxylin and de-hydrated with alcohol to Xylene, and mounted with Permount (Sigma).
2. RESULTS a) Metastatic lesions are indistinguishable from their corresponding primary HCC.
[0232] To define the specific changes associated with the metastatic process in HCC, we compared the gene expression profiles of primary HCC samples from individuals with either intra-hepatic spreads (group P) or tumor thrombi in the portal vein (group PT) together with their matched metastatic lesions, i.e., P-M or PT-M, respectively, with their coπesponding non-cancerous liver tissues. Initially, we compared the gene expression profiles of 50 primary and metastatic tumor samples from 30 randomly selected individuals [i.e., 10 patients with metastasis-free HCC (group PN), 10 PT patients and 10 P patients]. We attempted to classify them into clinical groups with an unsupervised hierarchical clustering algorithm based on an overall expression similarity profile using either entire 9180 genes or approximately 2487 genes derived from a gene screen filter that excluded genes not significantly more variable than the median at EO.01. However, these clustering approaches did not yield any meaningful classification that coπesponded to predefined clinical groups. Similarly, we could not obtain a meaningful classification using 107 genes from filtering genes with an average of 2-fold greater variations in the gene expression ratio when compared with their median. The results of this analysis imply that primary and metastatic HCC differ only by a relatively small subset of genes, whereas the gene clustering algorithm may be dominated by variations among many other genes, therefore, hindering classification. [0233] To search for such small differences, we applied a supervised class comparison analysis with univariate F-tests and a global permutation test to define genes that were differentially expressed among predefined clinical groups. A comparison of five clinical groups (i.e., P, P-M, PT, PT-M, and PN) yielded a total of 143 significant genes (EO.0005). Multidimensional scaling analysis based on the first three principal components of these 143 significant genes revealed that the PN samples are distinct from the remaining samples, while the P, P-M, PT, and PT-M samples are inseparable (Fig. la). Unexpectedly, the gene expression profiles of primary and matched metastatic HCC tumors were not significantly distinguishable. b) PN is distinct from PT and P.
[0234] To confirm and extend the above findings, we performed a class comparison analysis of 30 primary HCC samples from PN, PT, and P patients. This analysis yielded a total of 383 significant genes (EO.0005). A hierarchical clustering algorithm was then used to sort these 30 PN, P, and PT samples based on the expression profile of these 383 genes (Fig lb). Two major branches were observed in the hierarchical tree, one associated with PN samples, and the other with P and PT samples. Again, P and PT samples were not fully discriminated (Fig lb). Thus, primary metastasis-free HCC has a gene expression profile markedly different from that of primary HCC with metastatic lesions in the portal vein or elsewhere in liver parenchyma. [0235] To further define a gene set that could accurately discriminate into two predefined classes and to identify metastasis-associated genes, we used a supervised machine learning classification algorithm known as compound covariate predictor (CCP), which includes a "leave-one-out" cross-validation test to avoid the statistical problem of over-estimating prediction accuracy that occurs when a model is trained and evaluated with the same samples. This analysis also creates a multivariate predictor for determining which one ofthe two classes a given sample belongs to, and a gene list that is univariately significant at a given statistically significant level. We divided 50 HCC samples from 30 patients into various pairs based on different clinical criteria and applied the CCP to each pair (Table 1), using an entire gene set with a E value < 0.001. At this specified significance level, the expected number of false-positive genes in the classifier is less than 10. The misclassification rate was determined by leave-one-out cross-validation. For each step ofthe cross-validation in which one sample was left out, the selection of informative genes and the creation ofthe multi-gene classifier was repeated from scratch. The probability of obtaining as small a cross-validated misclassification rate by chance was obtained by repeating the entire cross-validation procedure using 2000 random permutations ofthe class labels for the clinical criteria being evaluated. That gave rise to a classifier E (Table 1). Using this supervised machine learning classification algorithm, again we found no significant difference between paired PT and PT- M samples (Table 1). Gene expression profiles in P and PT samples were almost identical to their paired metastatic P-M and PT-M samples (Table 1). The number of genes in these classifiers was at the background (false-positive) level. These data are in agreement with the clustering and multidimensional scaling analysis described above.
[0236] In contrast, we accurately predicted primary tumors (100%) from PN and PT samples with a total of 153 significant genes in the classifier (Table 2). The cross-validated misclassification rates were significantly lower than expected by chance (p<0.0005) (Table 1). Similarly, we accurately predicted PN and P samples as well as PN and P/PT samples with significant numbers of genes in the classifiers (Table 1). However, the CCP yielded no statistical significant classification among P, PT, PT-M, and P-M, and the number of genes in these classifiers also was insignificant. Moreover, we found no statistically significant classification when tumor sizes, ages, tumor encapsulation, or ciπhosis were used as clinical categories. These data are consistent with the findings of class comparison analysis including multidimensional scaling and hierarchical clustering algorithm analyses. We conclude that primary and metastatic tumors have a very similar gene expression signature and that primary metastasis-free HCC tumors are distinct from primary HCC tumors with either tumor thrombus in portal vein or intra-hepatic spread.
Table 1. Performance of classifier during "leave-one-out" cross validation * Total Number Number of genes
Classifier Clinical number of cases Classifier in the category ** groups of cases misclassified R value classifiers
PN vs. PT PN 10 0 O.0005 153
PT 10 0
PN vs. P PN 10 1 <0.0005 157
P 10 0
PN vs. P/PT PN 10 2 O.001 256
P and PT 20 0 P VS. PT P 10 3 0.216 20
PT 10 4
PT vs. PT-M paired 10 3 0.296 1
5 samples
P/PT vs. P-M/PT-M paired 20 5 0.132 7 samples
1 100 P P vvss.. PPTT--MM P P 1 100 4 0.248 14
P PTT--MM 1 100 3
P PTT vvss.. PP--MM P PTT 1 100 2 0.163
P P--MM 1 100 4
1 1 15 J c
T Tuummoorr ssiizzeess > > 55 ccmm 1 166 7 0.234
< 5 cm 14 4
Ages > 45 yr. 17 5 0.334
20 < 45 yr. 13 7
Tumor encapsulated presence 9 2 0.037 13 absence 21 4
25 Ciπhosis presence 14 7 0.798 absence 6 6
* Compound covariate predictor was used to classify various clinical groups with a total of 30 9180 gene expression data at a significance level of E=0.001. The classifier was based on
2000 random permutations. The expected number of false-positive genes in the classifier is
10.
** PN, single primary HCC; PT, primary HCC with tumor thrombi in portal vein; PT-M, tumor thrombi from paired PT; P, primary HCC with intra-hepatic metastasis; P-M, intra- 35 hepatic metastasis from paired P; P/PT, both P and PT; P-M/PT-M, both P-M and PT-M; tumor sizes, diameter in length.
c) A gene expression-based model from supervised machine learning algorithm can predict HCC patients with 40 metastatic potential.
[0237] The success in distinguishing PN from PT with CCP allowed us to develop a gene- expression-based model to predict HCC patients who had the potential to develop metastasis. We randomly selected primary HCC samples from 10 PN patients and 10 PT patients as a training set to generate a prediction model by "leave-one-out" cross-validated classification. 45 The classification of training samples created a 153-gene list, which provided the base for predicting testing samples, referred to as the "weighted voting" exercise by generating a multi-factorial L value (see Materials and Methods). We included all ofthe remaining 20 primary HCC samples as a test set (15 P patients, 3 additional PN patients, and 2 additional PT patients). Fig 2 shows the calculated "weighted voting" L value with metastatic samples yielding negative values and non-metastatic samples yielding positive values. All of the test samples with the exception of one "P" sample (S29) were classified to the metastatic group (Fig 2a). Patient follow-up data indicated that one PN patient (S56) was found to develop lung metastases 8 months following surgery, the second PN patient (S57) was cancer-free 9 months after surgery, and the third patient (S55) did not respond to the follow-up request. We also analyzed these samples by multidimensional scaling based on the 153-gene set obtained from the PN/PT comparison. It appears that S29 has a gene expression profile more similar to the P and PT groups than to that ofthe PN group (Fig 2b), suggesting that S29 should belong to the P and PT groups. Thus, we accurately classified at least 18 of 20 blinded HCC patients (90%) with metastatic potential.
Table 2 153 Significant genes for predicting metastasis and their values necessary for computing multi-factorial L value in the prediction model
Figure imgf000073_0001
Figure imgf000074_0001
Figure imgf000075_0001
Figure imgf000076_0001
[0238] The above outcome predictor separated 40 patients into two groups, one being metastatic and the other being non-metastatic. Kaplan-Meier survival data indicates that patients who were predicted to be metastatic had significantly shortened survival when compared with patients without detectable metastasis (Fig 2c). Because the mortality of HCC patients relies largely on whether they develop intra-hepatic metastasis, our results indicate that the gene set used in the classifier provides an accurate gene expression signature reflecting liver cancer metastasis and survival. d) Osteopontin promotes HCC metastasis. [0239] The above study indicates that the genes necessary for intra-hepatic metastasis should be included in the prediction model. However, the list of 153 genes from the prediction model was based on a stringent criterion (P value at 0.001) to minimize the number of false-positive genes in the classifier that is needed for an accurate classification. Such stringent criterion may exclude many genes that could be significant for metastasis progression. To broaden our search, we performed univariate F-tests with a total of 2000 random permutations at a P value of < 0.002 on 10 PN and 10 PT primary HCC samples. This analysis yielded a total of 224 significant genes with less than 20 expected false- positives (see Table 3). To identify genes that may contribute to liver cancer metastasis, we inspected the 224-gene list and sorted the top 30 genes whose expressions were altered largely in PT and PT-M, but rarely in PN (see Table 4). These genes were median-centered and visualized by hierarchical clustering algorithm using centered correlation and complete linkage (Fig. 3a).
[0240] A gene with an average of over 3-fold overexpression in PT, but not in PN, was identified as osteopontin (OPN) (SEQ ID NO:l), a secreted phosphoprotein that has recently been found to be highly expressed in metastatic breast tumors as well as malignant lung, colon, and prostate cancers. Comparison of microarray expression data indicated that OPN expression is elevated in most PT samples and their corresponding PT-M samples, but to a much lesser degree in the PN samples (Fig 3b). OPN overexpression in PT samples, but not in PN samples, was confirmed by a semi-quantitative RT-PCR analysis (Fig 3c and d). Immunohistochemical analysis (IHC) of OPN was also performed on 29 primary HCC (including 16 new HCC cases) and 8 normal livers from healthy organ donors. The immunoreactivity of OPN on these samples was evaluated by a blinded fashion. Only metastatic tumors were positive for cytoplasmic OPN staining, especially in the area with high density of vasculature (Fig. 4). The IHC results mostly agreed with microarray and RT- PCR data (61% positive cases; 1 1 of 18 metastatic HCC) (data not shown). Taken together, these studies demonstrate a good diagnostic value of OPN for metastatic HCC patients.
[0241] To determine the role of OPN in metastasis, we compared the level of OPN in human HCC cell lines by Western blot and in vitro invasiveness by Matrigel assay. The level of OPN was high in SK-Hep-1, intermediate in Hep3B and low in CCL13 (Fig 5a), which coincided with their invasiveness (Fig 5b). An OPN neutralizing antibody significantly blocked invasion of SKHep-1 (p<0.001) and Hep3B cells (p<0.04). However, recombinant murine OPN did not show any statistically significant stimulation (p>0.05) on Hep3B and Sk- Hep-1 cells, implying that either OPN produced by tumor cells is sufficient for maintaining an invasive phenotype, or that lesser effect is due to species difference. Similar results were obtained with 5 additional HCC cell lines (Fig 5 c). However, the neutralizing antibody had little effect on cell viability and migration (Fig 5c, right panel).
[0242] To extend above finding, we examined the role of OPN on pulmonary metastasis of HCC cells in nude mice. HCCLM3 cell line is a clone derived from MHCC97 cells with a high degree of pulmonary metastasis following subcutaneous (s.c.) injection (Li et al., J. Cancer Res. Clin. Oncology, 2002). Consistent with our recent data, a 100% of tumorigenicity was achieved in 1 week after s.c. injection. There was no significant difference in the size of primary tumors between control and anti-OPN groups (Figure 5 E), which is consistent with our in vitro results that anti-OPN does not affect HCC cell growth. At the 5th week, pulmonary metastatic lesions were detected in every mouse in the control group with most ofthe grade I- II tumor clusters and some grade III- IV tumor clusters (Figure 5 E, F). The control mice had an average of 11.1 ± 2.9 tumor clusters per lung. In contrast, only about a half of mice in the anti-OPN group had developed lung metastasis and remaining mice developed mostly grade I tumor clusters with a combined average of 2.6 ± 1.0 tumor clusters per lung, and this effect was statistically significant (p<0.01). Therefore, anti-OPN antibody shows a significant inhibitory effect on the lung metastasis of HCCLM3 cells. Table 3. 224 Significant genes for predicting metastasis and their values necessary for computing multifactoπal L value m the prediction model.
-4
Figure imgf000078_0001
Figure imgf000079_0001
-4 -4
Figure imgf000079_0002
Figure imgf000080_0001
-4 oe
Figure imgf000080_0002
Figure imgf000081_0001
-4
KO
Figure imgf000081_0002
oe ©
Figure imgf000082_0001
Figure imgf000083_0001
oe
Figure imgf000084_0001
oe
Figure imgf000085_0001
Figure imgf000086_0001
oe
Figure imgf000087_0001
Table 4. 30 Significant genes for predicting metastasis and their values necessary for computing multifactorial L value in the prediction model.
Figure imgf000088_0001
oe
-4
Figure imgf000089_0001
B. Example 2: Predicting a predisposition for Hepatocellular Carcinoma 1. Material and methods a) Patients and tissue samples
[0243] Surgical specimens were collected with prior informed consent and with the protocols and the approval by the Institution Review Board of University of Minnesota. Liver samples were obtained from 59 end-stage chronic liver disease patients who received liver transplantation between 1995-2001. Disease-free liver samples from 8 liver donors were used as control. The collection of these samples was mainly managed through the Liver Tissue Procurement and Distribution System (LTP ADS) at University of Minnesota, USA. Tumor and matched non-tumor liver samples from 64 patients were obtained through either the LTP ADS program or Liver Cancer Institute at Fudan University, China. Frozen samples once received was stored immediately at -80°C in a tissue repository database.
b) cDNA microarray
[0244] Total RNA was extracted from frozen tissues by using Trizol reagent (Invitrogen, Gaithersburg, MD) according to the manufacturer's protocol. The quality of extracted RNA was determined by spectrophotometry and by the appearance of characteristic 28S and 18S rRNA fragment on a 1% agarose gel. Each RNA sample divided into several tubes same amount and stored -80°C. For the common reference of cDNA microarray, total RNA samples from 8 normal liver were combined together, and were aliquoted into each tubs. [0245] cDNA microarrays were purchased from NCI microarray facility, Advanced Technology Center, NCI, NIH (Gaithersburg, MD). These human UniGem v2.0 array contained 9180 cDNA clones that map into 8281 unique UniGene clusters (base on Hs Unigene Build #131 released on Feb. 28, 2001) and 122 Incyte EST clones (Incyte Genomics, Palo Alto, CA). The hybridization was performed according to an optimized protocol established by the NCI (Wu et al, Oncogene 20:3674-3682, 2001 ; Ye et al, Nature Med. 9:416-423, 2003). Fluorescent images of hybridized microarrays were obtained by using GenePix 4000 scanner and GenePix Pro software (Axon Instruments, Foster City, CA). Detailed information as being collected according to the proposed Minimum Information About a Microarray Experiment Standards (Brazma A et al., Nat Genet 2001) will be made available through the NCBFs Gene Expression Ominibus public database. c) Statistical analysis
[0246] A hierarchical clustering analysis was preformed using a relative gene expression ratio (Cy5/Cy3) to examine the relatedness among expression patterns of several gene lists and those in two risk groups. Cluster analysis was performed using Cluster software and visualized using Tree View software (Eisen et al., supra). Hierarchical clustering was performed following median centering normalization.
[0247] Analyses were performed using BRB ArrayTools developed by Dr. Richard Simon and Amy Peng ofthe Biometrics Research Branch at National Cancer Institute. The data from each array were scaled in order to normalize data for inter-array comparisons. The class comparison tool was used for comparing two pre-defined risk groups. The F-test was a generalization ofthe two-sample t-test for comparing values among groups. The class comparison tool computed an F-test separately for each gene using the normalized log-ratios for cDNA. Several other important statistics were also computed. The tool performed random permutations of the group. Based on these random permutations, the tool computed the permutation p value associated with each gene in the list.
[0248] Classification of samples into one of two pre-determined classes based on gene expression data was performed using several algorithms including compound covariate predictor, K-nearest neibougher predictor, or support vector machine predictor. The predictor was built in two steps. First, a standard two-sample t-test was performed to identify genes with significant differences (at level 0.001) in log-expression ratios between the two classes. Second, the log-expression ratios of differentially expressed genes were combined into a single compound covariate for each sample; the compound covariate was used as the basis for class prediction. The compound covariate for sample i was defined as
j where t, was the t-statistic for the two group comparison of classes with respect to geney, xu was the log-ratio measured in specimen i for genej and the sum is over all differentially expressed genes.
[0249] We predicted the classification of a new sample by computing the following linear combination: L = Σ, t, *(x, - ra,) where t, was t-value for gene , x, was log-ratio of gene in the new sample to be classified, and m, was midpoint between the two classes for gene /. The index i run over all the genes that are significant in the original analysis. When L was positive, then the new sample should be classified to be of the first phenotype label whereas L was negative, then the new sample should be classified to be ofthe second phenotype label.
d) EpCAM expression and its in vitro inhibition [0250] The expression of EpCAM was assessed by semi-quantitative PCR. Total RNA was reversed-transcribed to produce single-stranded cDNA using random primers (Promega) with Superscript II reverse transcriptase (Invitrogen) according to manufacturer's protocol. PCR amplification was performed with QuantumRNA 18S Internal Standards (Ambion) by using HotStarTaq DNA polymerase (Qiagen) according to manufacturer's protocol. The primer sequences are as follow: forward, 5 '-TGC CGC AGC TCA GGA AGA ATG TGT-3 ' (SEQ ID NO:6); reverse, 5 '-CAT CAT TCT GAG TTT TTT GAG AAG-3' (SEQ ID NO:7).
[0251] siRNA was used to inhibit EpCAM expression. siRNA were synthesized by Qiagen. The sense and antisence strands of EpCAM are: sense, 5'-GUU UGC GGA CUG CAC UUC AdTdT-3' (SEQ ID NO:8); antisense, 5'-UGA AGU GCA GUC CGC AAA CdTdT-3' (SEQ ID NO:9). Non-silencing RNA was purchased from Qiagen and used as control siRNA. The sequences of control siRNA were: sense, 5'-UUC UCC GAA CGU GUC ACG UdTdT-3' (SEQ ID NO: 10); antisense, 5'-ACG UGA CAC GUU CGG AGA AdTdT-3' (SEQ ID NO: 11). Transfection of siRNAs was carried out using TransIT-TKO transfection reagent (Mirus) according to the manufacturer's protocol and 200 nM siRNA duplex per experiment. Cell growth was determined by using Cell Counting Kit-8 (Dojindo Molecular Tech.) as described by the manufacturer. The experiments were performed in triplicate.
2. Results
[0252] Gene expression profiles of liver samples from 59 chronic liver disease (CLD) patients and of 14 HCC samples were compared to that of a pool of 8 disease-free normal liver samples by microarray containing 9128 human cDNA clones (Ye et al., Supra). The CLD samples included 7 hepatitis B (HBV), 11 hepatitis C (HCN), 3 hemochromatosis (HHC), 5 Wilson's Disease (WD), 10 alcoholic liver disease (ALD), 16 primary biliary cirrhosis (PBC) and 7 autoimmune hepatitis (AEH). A supervised univariate F-test algorithm with 2000 random permutations ofthe class labels was used to search for genes that can discriminate these 7 CLD groups. This analysis yielded a total of 489 significant genes (p<0.0005). Hierarchical clustering analysis (as described by Eisen et al., supra) of the 489 genes revealed that these 7 liver disease groups were separated into two major branches, one consisting mostly of HBV, HCV, HHC, and WD samples and other containing mainly PBC, ALD, and AIH samples. These results indicate that HBV, HCV, HHC, and WD are more closely related each other than they are as a group to PBC, ALD, or AIH. The segregation of these samples by a molecular signature specifically reflecting their etiologies was correlated coincidentally with their risk to develop HCC, with an exception of WD samples (data not shown). To further determine the degree of difference among these groups, a t-test was performed based compound covariate predictor analysis among these 7 groups with "leave- one-out" cross-validation and 2000 random permutation tests. A total of 21 simulations were performed, which yielded 500 composite genes. The result of the hierarchical clustering of these genes is consistent with that of F-test (data not shown). Consistently, PBC, ALD, or AIH was more significantly different from HBV, HCV, HHC, or WD, while the differences among the etiologies were less significant (data not shown). It appears that the WD samples, at least for this set, belong to the high-risk group. The interpretations from above results are that the molecular signature is dominated by the genes segregating the high risk group from the low risk group for their ability to develop HCC while genes reflecting their individual etiologies were minuscule.
[0253] The genes that were commonly disregulated in HBV/HCV/HHC/WD samples but not in ALD/PBC/AIH were hypothesized to be more closely related to the molecular signature of HCC. To search globally for such a gene set, the k-nearest neighbors (K=3) (3NN) and support vector machine (SVM) algorithms were applied with a "leave-one-out" cross-validation test and 2000 random permutations of class labeling test to the high risk (HBV/HCV/HHC/WD) and low risk (ALD/PBC/AIH) groups at a E value O.001, a computation strategy similar to our recent study (Ye et al., supra). This analysis yielded a composite classifier containing 556 significant genes, which separated these two groups very well. It provided a significant class prediction among these groups with an overall accuracy of 78% by 3NN and 86% by SVM, respectively, and the cross-validated misclassification rates were significantly lower than expected by chance (p<0.0005) (data not shown). However, random grouping of these samples yielded statistically insignificant classification (data not shown).
[0254] It was noted that many genes in the 556-gene set can be found in the 14 HCC samples analyzed (data not shown). To identify genes that were commonly disregulated in the high-risk group and in HCC, the 14 HCC samples were pooled together with the high-risk group and then compared with the low risk group using 3NN algorithm at a E value <0.001, with 2000 random permutations. This analysis yielded 416 genes, in which 273 genes were found in the 556-gene set (49% overlapping). These results indicate that about half of the signature genes that can discriminate between the high risk and the low risk groups are present in HCC samples. To determine if the 273-gene set (Table 5) was a common signature for tumors, we applied this set to two independent HCC gene expression profiles using the 3NN and SVM predictors. One set included 24 HCC samples derived from a comparison with the same normal liver control used above and the other set including 50 HCC samples that were compared to its matched non-cancerous liver tissues (Ye et al., supra). The 273- gene signature provided an increased fitness by SVM in their classification with an overall accuracy of 92% for the 24 HCC samples and 94% for the 50 HCC samples (data not shown), which was improved in overall performance as compared to the 556-gene set. Consistently, the non-overlapping 283-gene set did not provide any satisfactory performance. Because most ofthe HCC-associated genes in the non-overlapping gene set were eliminated, most of the 283 genes may belong to the signatures separating the etiologies. Moreover, the 383 overlapping genes selected from a comparison of HBV/HCV/HHC/WD and ALD/PBC/AIH/HCC did not yield a meaningful classification ofthe two independent HCC sets with an overall predictive rate below 50% (a random event). The 273 genes were examined in multiple liver samples taken from two HBV patients and from different parts of the liver that were spread at least in a 5 cm diameter region. The profiles of these 273 genes in different parts ofthe livers from these two patients were almost identical (data not shown). Furthermore, top 25 genes with the lowest parametric p-values (pO.OOOOOl) were selected from the 273-gene set. This set gave rise to a comparable result as the 273-gene set (data not shown). Taken together, these results indicate that the 273-gene set contains most ofthe HCC-associated genes relevant to HCC development and that these genes are widely spread in the parenchyma ofthe affected livers rather than are retained locally.
[0255] To examine if the 273-gene set is a common signature in other human tumors, the gene parameters in this signature were applied using SVM to 98 HCCs, 53 lung cancers, 89 gastric adenocarcinoma, 37 soft tissue tumors, 39 breast tumors and 23 difuse large B-cell lymphoma (DLBCL) from several publicly available microarray datasets (Alizadeh et al., supra; Perou et al., supra; Garber et al., Proc. Natl Acad. Sci. U.S.A. 98:13784-13789, 2001). While the 273-gene set consistently performed well with additional 98 HCC samples (80% of the samples fit the signature), 97% of breast cancers (39 cases) and 78% of DLBCL cases shared similar signatures. In contrast, most ofthe tumor samples from lung, soft tissues, and stomach showed a very poor fit to this signature (between 6 and 30%o ofthe cases) (data not shown). As a control, the 283-gene set (non-HCC-related genes) did not provide a satisfying prediction to these samples. Thus, the HCC-associated genes in the classifier appear to be commonly disregulated in breast cancer and DLBCL, but not in lung adenocarcinoma, soft tissue tumors, and gastric adenocarcinoma.
[0256] Above studies suggested that genes responsible for the genesis of HCC may be present in the 273 gene set. For example, the gene whose expression is significantly elevated in the high-risk group but not in the low-risk group may act as an oncogene to promote cell growth. To test this "proof-of-principle" hypothesis, a lead gene at the top ofthe 273 genelist was selected. This gene was identified as EpCAM or tumor-associated calcium signal transducer 1 (TACSTD1, Hs.692), with an average of a 3.6-fold increased expression in the high risk group but only a 1.7 fold in the low risk group (Fig 6a) as well as in HCC (data not shown). Elevated expressions of EpCAM in the high-risk CLD samples were verified by the quantitative RT-PCR analysis (Fig 6b). The expression of EpCAM in various HCC cell lines was examined by Western blot analysis. EpCAM is highly expressed in Hep3B cells but the expression level is relatively low in Huhl and Huh4 cells (Fig 6c), generally correlating with their growth rates (Fig 6d). Furthermore, inhibition of EpCAM expression by two different siRNA oligos specific to EpCAM resulted in a significant growth inhibition of Hep3B cells (Fig 6f). In contrast, a control siRNA oligo has no such effect (Fig 6e and data not shown). These results indicate that EpCAM may provide oncogenic property by promoting neoplastic cell proliferation.
[0257] The 273 significant genes, their gene symbols, their map positions, and their UG Cluster identifiers are presented in Table 5.
Table 5. 273 significant genes for predicting the potential for developing HCC in a patient with a chronic liver disease and their values necessary for computing multifactorial L value in the prediction model.
Figure imgf000096_0001
Figure imgf000096_0002
Figure imgf000097_0001
KO Ul
Figure imgf000097_0002
Figure imgf000098_0001
Figure imgf000098_0002
Figure imgf000099_0001
KB -4
Figure imgf000099_0002
Figure imgf000100_0001
Figure imgf000100_0002
Figure imgf000101_0001
Figure imgf000101_0002
Figure imgf000102_0001
© ©
Figure imgf000102_0002
Figure imgf000103_0001
Figure imgf000103_0002
Figure imgf000104_0001
Figure imgf000104_0002
Figure imgf000105_0001
Figure imgf000105_0002
Figure imgf000106_0001
Figure imgf000106_0002
Figure imgf000107_0001
©
Ul
Figure imgf000107_0002
Figure imgf000108_0001
Figure imgf000108_0002
Figure imgf000109_0001
©
-4
Figure imgf000109_0002
Figure imgf000110_0001
© oe
Figure imgf000110_0002
Figure imgf000111_0001
©
Figure imgf000111_0002
Figure imgf000112_0001
Figure imgf000112_0002
Figure imgf000113_0001
Figure imgf000113_0002
Figure imgf000114_0001
Figure imgf000114_0002
Figure imgf000115_0001
Figure imgf000115_0002
Figure imgf000116_0001
Figure imgf000116_0002
Figure imgf000117_0001
Figure imgf000117_0002
Figure imgf000118_0001
Figure imgf000118_0002
Figure imgf000119_0001
Figure imgf000119_0002
Figure imgf000120_0001
Figure imgf000120_0002
[0258] The top 25 genes with the lowest parametric p-values (p<0.000001) were selected from the 273-gene set and this set gave rise to a comparable result as the 273-gene set. These 25 genes significant for indicating a liver disease patient's risk of developing HCC, their gene symbols, their map positions, and their UG Cluster identifiers are presented in Table 6. A further set of 10 significant genes for predicting the risk of developing HCC in a patient suffering from a severe liver disease has been determined in a similar manner and is presented in Table 7.
Table 6. 25 significant genes for identifying patients likely to develop HCC by the compound covariate predictor analysis and their values necessary for computing multifactorial L value in the prediction model.
Figure imgf000122_0001
Figure imgf000122_0002
Figure imgf000123_0001
Figure imgf000123_0002
Figure imgf000124_0001
Figure imgf000124_0002
These 25 genes were selected by the 10 smallest parametric p values (pθ.000001).
Table 7. 10 Significant genes for predicting HCC development and their values necessary for computing multifactorial L value in the predictio model.
Figure imgf000125_0001
Figure imgf000125_0002
Figure imgf000126_0001
Figure imgf000126_0002

Claims

WHAT IS CLAIMED IS:
1. A method for identifying potential therapeutic targets for inhibiting metastasis in a patient suffering from hepatocellular carcinoma (HCC), comprising the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a metastatic HCC patient; b) capturing markers from the sample and generating a first signal; c) repeating steps a) and b) with a sample from a non-metastatic HCC patient and thereby generating a second signal; and d) comparing the first and second signals and thereby identifying a subset of cellular markers whose level is different in the first and second signals, wherein the subset of cellular markers are potential therapeutic targets for treating HCC metastasis in an HCC patient.
2. The method of claim 1, wherein a signal generated from a normal non- cancerous sample on an array identical to the array of step a) is subtracted in steps b) and c) to generate the first and second signals.
3. A method for predicting the metastatic potential in a patient suffering from hepatocellular carcinoma (HCC), comprising the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a metastatic HCC patient, the set of cellular markers comprising at least ten genes or proteins encoded by genes independently selected from the genes of Table 2; b) capturing markers from the sample; c) generating a first signal from the captured markers of step b); d) repeating steps a) to c) with a sample from a non-metastatic HCC patient and thereby generating a second signal; e) repeating steps a) to c) with a sample from an HCC patient with unknown metastatic potential and thereby generating a third signal; and f) comparing the third signal to the first and the second signals and thereby determining the metastatic potential of the HCC patient of step e).
4 . The method of claim 3, wherein the set of cellular markers comprises at least 20 genes or proteins encoded by genes independently selected from the genes of Table 2.
5. The method of claim 4, wherein the set of cellular markers comprises at least 50 genes or proteins encoded by genes independently selected from the genes of Table 2.
6. The method of claim 5, wherein the set of cellular markers comprises at least 100 genes or proteins encoded by genes independently selected from the genes of Table 2.
7. The method of claim 6, wherein the set of cellular markers comprises the genes or proteins encoded by genes of Table 2.
8. The method of claim 3, wherein the set of cellular markers comprises the genes or proteins encoded by genes of Table 4.
9. The method of claim 3, wherein the set of cellular markers comprises the genes or proteins encoded by genes of Unigene numbers Hs.313, Hs.69707, Hs.222, Hs.63984, Hs.75573, Hs.177687, Hs.69707, Hs.222, Hs.323712, and Hs.63984.
10. The method of claim 3, wherein the sample of steps a) and b), the sample of step d), and the sample of step e) are liver tissue extracts.
11. The method of claim 3, wherein the array of step a) is a genomic array.
12. The method of claim 3, wherein the array of step a) is a proteomic array.
13. A method for identifying potential therapeutic targets for preventing hepatocellular carcinoma (HCC) in a patient suffering from a chronic liver disease, comprising the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a patient with a chronic liver disease and a high risk of developing HCC; b) capturing markers from the sample and generating a first signal; c) repeating steps a) and b) with a sample from a patient with a chronic liver disease and a low risk of developing HCC and thereby generating a second signal; and d) comparing the first and second signals and thereby identifying a subset of cellular markers whose level is different in the first and second signals, wherein the subset of cellular markers are potential therapeutic targets for preventing HCC in a patient with a chronic liver disease.
14. The method of claim 13, wherein a signal generated from a normal non-canerous sample on an array identical to the array of step a) is subtracted in steps b) and c) to generate the first and second signals.
15. A method for predicting the risk of developing hepatocellular carcinoma (HCC) in a patient suffering from a chronic liver disease, comprising the steps of: a) contacting an array comprising capture reagents for a set of cellular markers with a sample from a patient with a chronic liver disease and a high risk of HCC, the set of cellular markers comprising at least ten genes or proteins encoded by genes independently selected from the genes of Table 5; b) capturing markers from the sample; c) generating a first signal from the captured markers of step b); d) repeating steps a) to c) with a sample from a patient with a chronic liver disease and a low risk of HCC and thereby generating a second signal; e) repeating steps a) to c) with a sample from a patient with a chronic liver disease and an unknown risk of HCC and thereby generating a third signal; and f) comparing the third signal to the first and the second signals and thereby determining the risk of developing HCC in the patient of step e).
16. The method of claim 15, wherein the set of cellular markers comprises at least 20 genes or proteins encoded by genes independently selected from the genes of Table 5.
17. The method of claim 16, wherein the set of cellular markers comprises at least 50 genes or proteins encoded by genes independently selected from the genes of Table 5.
18. The method of claim 17, wherein the set of cellular markers comprises at least 100 genes or proteins encoded by genes independently selected from the genes of Table 5.
19. The method of claim 18, wherein the set of cellular markers comprises the genes or proteins encoded by genes of Table 5.
20. The method of claim 15, wherein the set of cellular markers comprises the genes or proteins encoded by genes of Table 6.
21. The method of claim 15, wherein the set of cellular markers comprises the genes or proteins encoded by genes of Table 7.
22. The method of claim 15, wherein the sample of steps a) and b), the sample of step d), and the sample of step e) are liver tissue extracts.
23. The method of claim 15, wherein the array of step a) is a genomic array.
24. The method of claim 15, wherein the array of step a) is a proteomic array.
25. The method of claim 15, wherein the patient of step a) suffers from a disease selected from the groups consisting of hepatitis B, hepatitis C, hemachromatosis, and Wilson's disease.
26. The method of claim 15, wherein the patient of step d) suffers from alcoholic liver disease, autoimmune hepatitis, or primary biliary cirrhosis.
27. The method of claim 15, wherein the patient of step e) suffers from a disease selected from the group consisting of hepatitis B, hepatitis C, hemochromatosis, Wilson's disease, alcoholic liver disease, autoimmune hepatitis, and primary biliary cirrhosis.
28. A computer readable medium comprising: a) code for a first data set, derived from a first signal from an array comprising capture reagents for a set of cellular markers after contact with a sample from a metastatic HCC patient, the set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 2; b) code for a second data set, derived from a second signal from an array identical to the array of a) after contact with a sample from a non-metastatic HCC patient; c) code for a third data set, derived from a third signal from an array identical to the array of a) after contact with a sample from a HCC patient with unknown metastatic potential; and d) code for comparing the third data set with the first and second data sets.
29. A digital computer comprising the computer readable medium of claim 28.
30. A system comprising: a) a digital computer of claim 29; b) a chip with an array comprising capture reagents for a set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 2; and c) a reader capable of registering a signal from the array after contact with a sample.
31. A computer readable medium comprising: a) code for a first data set, derived from a first signal from an array comprising capture reagents for a set of cellular markers after contact with a sample from a patient with a chronic liver disease and a high risk of HCC, the set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 5; b) code for a second data set, derived from a second signal from an array identical to the array of a) after contact with a sample from a patient with a chronic liver disease and a low risk of HCC; c) code for a third data set, derived from a third signal from an array identical to the array of a) after contact with a sample from a patient with a chronic liver disease and an unknown risk of HCC; and d) code for comparing the third data set with the first and second data sets.
32. A digital computer comprising the computer readable medium of claim 31.
33. A system comprising: a) a digital computer of claim 32; b) a chip with an array comprising capture reagents for a set of cellular markers comprising at least 10 genes or proteins encoded by genes independently selected from the genes of Table 5; and c) a reader capable of registering a signal from the array after contact with a sample.
34. A method for inhibiting hepatocellular carcinoma (HCC) metastasis in a patient suffering from HCC, the method comprising the step of suppressing osteopontin (OPN) activity.
35. The method of claim 34, wherein the step of suppressing osteopontin (OPN) activity is accomplished by inhibiting OPN expression.
36. The method of claim 35, wherein an antisense polynucleotide is used to inhibit OPN expression.
37. The method of claim 34, wherein the step of suppressing osteopontin (OPN) activity is accomplished by inhibiting the specific binding between OPN and OPN receptor.
38. The method of claim 37, wherein an OPN antagonist is used to inhibit the specific binding between OPN and OPN receptor.
39. The method of claim 37, wherein an anti-OPN antibody is used to inhibit the specific binding between OPN and OPN receptor.
40. A method for inhibiting the development of hepatocellular carcinoma (HCC) in a patient suffering from a chronic liver disease, comprising the step of suppressing EpCAM activity.
41. The method of claim 40, wherein the step of suppressing EpCAM activity is accomplished by inhibiting EpCAM expression.
42. The method of claim 41, wherein an antisense polynucleotide is used to inhibit EpCAM expression.
43. The method of claim 41, wherein a small inhibitory RNA is used to inhibit EpCAM expression.
44. The method of claim 40, wherein the step of suppressing EpCAM activity is accomplished by inhibiting the specific binding between EpCAM and EpCAM receptor.
45. The method of claim 44, wherein an anti-EpCAM antibody is used to inhibit the specific binding between EpCAM and EpCAM receptor.
PCT/US2003/010783 2002-04-05 2003-04-04 Methods of diagnosing potential for metastasis or developing hepatocellular carcinoma and of identifying therapeutic targets WO2003087766A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003230838A AU2003230838A1 (en) 2002-04-05 2003-04-04 Methods of diagnosing potential for metastasis or developing hepatocellular carcinoma and of identifying therapeutic targets

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US37089502P 2002-04-05 2002-04-05
US60/370,895 2002-04-05

Publications (2)

Publication Number Publication Date
WO2003087766A2 true WO2003087766A2 (en) 2003-10-23
WO2003087766A3 WO2003087766A3 (en) 2004-07-29

Family

ID=29250601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/010783 WO2003087766A2 (en) 2002-04-05 2003-04-04 Methods of diagnosing potential for metastasis or developing hepatocellular carcinoma and of identifying therapeutic targets

Country Status (3)

Country Link
CN (1) CN1659287A (en)
AU (1) AU2003230838A1 (en)
WO (1) WO2003087766A2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1496928A1 (en) * 2002-04-08 2005-01-19 Ciphergen Biosystems, Inc. Serum biomarkers in hepatocellular carcinoma
EP1661991A1 (en) * 2003-08-24 2006-05-31 Nihon University Hepatocellular cancer-associated gene
WO2006090900A1 (en) * 2005-02-24 2006-08-31 Takeda Pharmaceutical Company Limited Preventives/remedies for cancer
EP1755669A2 (en) * 2004-04-27 2007-02-28 Illumigen Biosciences Inc. Methods and compositions for specifically targeting human hepatocellular carcinoma cells
US7803380B2 (en) 2006-06-20 2010-09-28 The United States Of America As Represented By The Department Of Health And Human Services Compositions and methods for diagnosis and treatment of tumors
EP2272987A3 (en) * 2004-07-09 2012-04-11 University of Pittsburgh of the Commonwealth System of Higher Education Identification of markers in esophageal cancer, colon cancer, head and neck cancer and melanoma
CN101812507B (en) * 2009-12-04 2012-07-04 复旦大学附属中山医院 Gene chip for predicting liver cancer metastasis and recurrence risk and manufacturing method and using method thereof
CN102690784A (en) * 2011-03-22 2012-09-26 上海市肿瘤研究所 Establishment and application of hepatocellular carcinoma cell line HCC-LY10
CN103808944A (en) * 2014-03-07 2014-05-21 高平 Biomarkers von willebrand factor (VWF) and ADAMTS13 and use thereof in cirrhosis diagnosis reagent
US9464324B2 (en) 2006-07-14 2016-10-11 The United States of America as represented by the Secretary, DHHS Methods of determining the prognosis of an adenocarcinoma
US11959838B2 (en) 2015-11-06 2024-04-16 Ventana Medical Systems, Inc. Representative diagnostics
US11971410B2 (en) 2017-09-15 2024-04-30 Arizona Board Of Regents On Behalf Of Arizona State University Methods of classifying response to immunotherapy for cancer
US11976274B2 (en) 2019-10-02 2024-05-07 Arizona Board Of Regents On Behalf Of Arizona State University Methods and compositions for identifying neoantigens for use in treating and preventing cancer

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101457254B (en) * 2008-10-09 2011-04-06 北京大学人民医院 Gene chip and kit for liver cancer prognosis
CN101891814B (en) * 2009-05-21 2012-11-07 中国科学院上海生命科学研究院 Anti-osteopontin OPN monoclonal antibody and application thereof
CN102507938B (en) * 2011-10-08 2014-12-24 复旦大学附属中山医院 Liver cancer metastasis prognosis quantitative antibody chip and reagent kit
CN102507936B (en) * 2011-11-09 2013-10-23 北京正旦国际科技有限责任公司 Multi-antibody immunomic mass spectrum kit for liver cancer marker
CN107817348A (en) * 2012-09-05 2018-03-20 亚利桑那州评议委员会,亚利桑那州法人团体,代理和代表亚利桑那州立大学 It was found that therapeutic target calibration method
CA2920608C (en) * 2013-05-28 2018-07-24 Five3 Genomics, Llc Paradigm drug response networks
GB201319878D0 (en) * 2013-11-11 2013-12-25 Immunovia Ab Method, Array and use thereof
US10758886B2 (en) 2015-09-14 2020-09-01 Arizona Board Of Regents On Behalf Of Arizona State University Conditioned surfaces for in situ molecular array synthesis
CN106957893B (en) * 2016-01-11 2020-06-09 中国科学院分子细胞科学卓越创新中心 Tumor immunotherapy drug target and application thereof
TWI725248B (en) * 2016-10-28 2021-04-21 茂英基因科技股份有限公司 Primary site of metastatic cancer identification method and system thereof
US20200224277A1 (en) * 2017-07-17 2020-07-16 Mao Ying Genetech Inc. Cell type identification method and system thereof
CN109870576B (en) * 2017-12-05 2021-08-10 中国科学院大连化学物理研究所 Application of quantitative detection of USP10 protein in primary liver cancer prognosis judgment kit
KR102180117B1 (en) * 2018-06-14 2020-11-17 가톨릭대학교 산학협력단 Hcc specific biomarkers
CN109234394A (en) * 2018-09-30 2019-01-18 深圳市南山区人民医院 A kind of diagnosing cancer of liver marker and its screening technique
CN109364249B (en) * 2018-11-05 2021-09-24 安徽医科大学 Application of MANF-targeted substance in preparation of product for treating intrahepatic bile duct cancer
CN109632773B (en) * 2019-01-08 2021-11-12 贵州大学 Screening method of dihydrolipoic acid succinyltransferase inhibitor
CN110029168B (en) * 2019-05-09 2020-02-14 山东省立医院 Application of gene FGL1 in preparation of colorectal cancer and lung cancer diagnostic kit and kit
CN110261619B (en) * 2019-06-14 2021-06-25 上海四核生物科技有限公司 Application of PRKAR2B protein as gastric cancer serum biomarker and kit thereof
CN110782954B (en) * 2019-10-31 2021-05-04 哈尔滨工业大学 Weight modular mapping method for predicting drug response in cancer cell strain
CN112002374B (en) * 2020-06-14 2022-04-22 北京臻知医学科技有限责任公司 MHC-I epitope affinity prediction method based on deep learning
WO2022105774A1 (en) 2020-11-17 2022-05-27 圣湘生物科技股份有限公司 Reagent combination and kit for detecting liver cancers, and use thereof
CN113652480A (en) * 2021-06-28 2021-11-16 武汉大学 Application of CAT in preparation of hepatocellular carcinoma early diagnosis kit and preparation or screening of anti-liver cancer drugs
CN114699405B (en) * 2022-03-15 2023-05-19 四川轻化工大学 Application of compound in preparation of medicine for treating non-alcoholic fatty liver disease
CN115992244B (en) * 2022-11-28 2024-06-04 武汉大学 SART1 effect in liver cancer treatment
CN117238369B (en) * 2023-09-19 2024-04-09 华中科技大学同济医学院附属同济医院 Renal clear cell carcinoma patient prognosis and drug sensitivity assessment model based on gene related to clear cell differentiation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175084A (en) * 1987-10-30 1992-12-29 Fuji Yakuhin Kogyo Kabushiki Kaisha Method for the diagnosis of hepatic carcinoma
US6524787B1 (en) * 1999-08-30 2003-02-25 Mary J. C. Hendrix Diagnostics and therapy based on vascular mimicry
US20030211466A1 (en) * 1999-12-28 2003-11-13 Ribonomics, Inc. Methods for identifying functionally related genes and drug targets

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175084A (en) * 1987-10-30 1992-12-29 Fuji Yakuhin Kogyo Kabushiki Kaisha Method for the diagnosis of hepatic carcinoma
US6524787B1 (en) * 1999-08-30 2003-02-25 Mary J. C. Hendrix Diagnostics and therapy based on vascular mimicry
US20030211466A1 (en) * 1999-12-28 2003-11-13 Ribonomics, Inc. Methods for identifying functionally related genes and drug targets

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1496928A4 (en) * 2002-04-08 2005-08-10 Ciphergen Biosystems Inc Serum biomarkers in hepatocellular carcinoma
EP1496928A1 (en) * 2002-04-08 2005-01-19 Ciphergen Biosystems, Inc. Serum biomarkers in hepatocellular carcinoma
EP1661991A1 (en) * 2003-08-24 2006-05-31 Nihon University Hepatocellular cancer-associated gene
EP1661991A4 (en) * 2003-08-24 2007-10-10 Univ Nihon Hepatocellular cancer-associated gene
EP1755669A2 (en) * 2004-04-27 2007-02-28 Illumigen Biosciences Inc. Methods and compositions for specifically targeting human hepatocellular carcinoma cells
EP1755669A4 (en) * 2004-04-27 2008-06-25 Illumigen Biosciences Inc Methods and compositions for specifically targeting human hepatocellular carcinoma cells
EP2272987A3 (en) * 2004-07-09 2012-04-11 University of Pittsburgh of the Commonwealth System of Higher Education Identification of markers in esophageal cancer, colon cancer, head and neck cancer and melanoma
WO2006090900A1 (en) * 2005-02-24 2006-08-31 Takeda Pharmaceutical Company Limited Preventives/remedies for cancer
US8247183B2 (en) 2006-06-20 2012-08-21 The United States of America, as represented by the Secretary of the Departmant of Health and Human Services Compositions and methods for diagnosis and treatment of tumors
US7803380B2 (en) 2006-06-20 2010-09-28 The United States Of America As Represented By The Department Of Health And Human Services Compositions and methods for diagnosis and treatment of tumors
US8568977B2 (en) 2006-06-20 2013-10-29 The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services Compositions and methods for diagnosis and treatment of tumors
US9464324B2 (en) 2006-07-14 2016-10-11 The United States of America as represented by the Secretary, DHHS Methods of determining the prognosis of an adenocarcinoma
CN101812507B (en) * 2009-12-04 2012-07-04 复旦大学附属中山医院 Gene chip for predicting liver cancer metastasis and recurrence risk and manufacturing method and using method thereof
CN102690784A (en) * 2011-03-22 2012-09-26 上海市肿瘤研究所 Establishment and application of hepatocellular carcinoma cell line HCC-LY10
CN102690784B (en) * 2011-03-22 2015-10-28 上海市肿瘤研究所 The foundation of hepatoma cell line HCC-LY10 and application
CN103808944A (en) * 2014-03-07 2014-05-21 高平 Biomarkers von willebrand factor (VWF) and ADAMTS13 and use thereof in cirrhosis diagnosis reagent
CN103808944B (en) * 2014-03-07 2016-04-20 高平 Biomarker VWF and ADAMTS13 and the purposes in liver cirrhosis diagnosis reagent thereof
US11959838B2 (en) 2015-11-06 2024-04-16 Ventana Medical Systems, Inc. Representative diagnostics
US11971410B2 (en) 2017-09-15 2024-04-30 Arizona Board Of Regents On Behalf Of Arizona State University Methods of classifying response to immunotherapy for cancer
US11976274B2 (en) 2019-10-02 2024-05-07 Arizona Board Of Regents On Behalf Of Arizona State University Methods and compositions for identifying neoantigens for use in treating and preventing cancer

Also Published As

Publication number Publication date
AU2003230838A1 (en) 2003-10-27
AU2003230838A8 (en) 2003-10-27
CN1659287A (en) 2005-08-24
WO2003087766A3 (en) 2004-07-29

Similar Documents

Publication Publication Date Title
WO2003087766A2 (en) Methods of diagnosing potential for metastasis or developing hepatocellular carcinoma and of identifying therapeutic targets
JP6140202B2 (en) Gene expression profiles to predict breast cancer prognosis
Ye et al. Predicting hepatitis B virus–positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning
Castro et al. Evidence that molecular changes in cells occur before morphological alterations during the progression of breast ductal carcinoma
Bonfiglio et al. Genetic and phenotypic attributes of splenic marginal zone lymphoma
Belbin et al. Molecular classification of head and neck squamous cell carcinoma using cDNA microarrays
Kikuchi et al. Expression profiles of non-small cell lung cancers on cDNA microarrays: identification of genes for prediction of lymph-node metastasis and sensitivity to anti-cancer drugs
Jayawardana et al. Determination of prognosis in metastatic melanoma through integration of clinico‐pathologic, mutation, mRNA, microRNA, and protein information
CN103733065B (en) Molecular diagnostic assay for cancer
US7666595B2 (en) Biomarkers for predicting prostate cancer progression
Carinci et al. Potential markers of tongue tumor progression selected by cDNA micro array
US7998674B2 (en) Gene expression profiling for identification of prognostic subclasses in nasopharyngeal carcinomas
US20090170715A1 (en) Prognostic and diagnostic method for cancer therapy
WO2004105573A2 (en) Method of diagnosis of cancer based on gene expression profiles in cells
CN106164296A (en) For prediction, the response of anti-angiogenic drugs and the molecular diagnosis of cancer prognosis are tested
CA2660857A1 (en) Prognostic and diagnostic method for disease therapy
EP1756309A2 (en) Methods for predicting and monitoring response to cancer therapy
WO2012154935A1 (en) Biomarkers that are predictive of responsiveness or non-responsiveness to treatment with lenvatinib or a pharmaceutically acceptable salt thereof
KR20070084488A (en) Methods and systems for prognosis and treatment of solid tumors
US20210363593A1 (en) CXCL13 Marker For Predicting Immunotherapeutic Responsiveness In Patient With Lung Cancer And Use Thereof
CN113462776A (en) m6Application of A modification-related combined genome in prediction of immunotherapy efficacy of renal clear cell carcinoma patient
Lin et al. Evolutionary route of nasopharyngeal carcinoma metastasis and its clinical significance
US20230047712A1 (en) Methods of Treatments Based Upon Molecular Response to Treatment
Schaner et al. Variation in gene expression patterns in effusions and primary tumors from serous ovarian cancer patients
Mamelak et al. Downregulation of NDUFA1 and other oxidative phosphorylation‐related genes is a consistent feature of basal cell carcinoma

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 20038129825

Country of ref document: CN

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP