WO2008112154A2 - Methods of using genomic biomarkers to predict tumor formation - Google Patents

Methods of using genomic biomarkers to predict tumor formation Download PDF

Info

Publication number
WO2008112154A2
WO2008112154A2 PCT/US2008/003063 US2008003063W WO2008112154A2 WO 2008112154 A2 WO2008112154 A2 WO 2008112154A2 US 2008003063 W US2008003063 W US 2008003063W WO 2008112154 A2 WO2008112154 A2 WO 2008112154A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
nucleic acid
cancer
tumor formation
expression
Prior art date
Application number
PCT/US2008/003063
Other languages
French (fr)
Other versions
WO2008112154A3 (en
Inventor
Russell Scott Thomas
Original Assignee
The Hamner Institutes For Health Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Hamner Institutes For Health Sciences filed Critical The Hamner Institutes For Health Sciences
Publication of WO2008112154A2 publication Critical patent/WO2008112154A2/en
Publication of WO2008112154A3 publication Critical patent/WO2008112154A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to biomarkers and methods of using the same in predicting tumor formation and identifying carcinogenic substances.
  • the two-year rodent bioassay is widely used to assess the carcinogenic potential of chemical, biological and physical agents.
  • Current regulatory standards require select agents to be tested for carcinogenic activity prior to commercial release, including pharmaceuticals, food additives, and pesticides.
  • NTP National Toxicology Program
  • each bioassay requires hundreds of animals and about $2 to $4 million per chemical (NTP 1996).
  • NTP 1996 the bioassays are performed late in the developmental pipeline after commitment of substantial resources in product development. A positive result can delay release of the product until the potential carcinogenic risks can be addressed through further study, or may even result in discontinuation of the product.
  • identifying potential carcinogens earlier in the development pipeline could provide substantial monetary savings.
  • risk assessment perspective there are approximately 80,000 chemicals registered for commercial use in the United States and 2,000 more added each year (NTP 2001). Since most have not been tested for carcinogenic activity, a more economical method to identify potential carcinogens would allow more chemicals to be tested for long- term health effects prior to human exposure.
  • transcriptomic and metabonomic technologies to identify biomarkers associated with toxicological endpoints has been the subject of considerable research.
  • most toxicology studies employing these technologies have focused on identifying biomarkers associated with relatively acute endpoints, such as hepatotoxicity and nephrotoxicity (Amin et al. 2004;
  • the present invention provides an alternative to the standard rodent cancer bioassay to identify substances such as chemical, biological and physical agents, for the potential to cause adverse effects to humans and animals. Further, the present invention provides methods of using biomarkers to predict the carcinogenic activity of a substance. The biomarkers of the present invention can discriminate between carcinogenic and non-carcinogenic treatments. Embodiments of the present invention provide methods of predicting tumor formation including determining a nucleic acid expression pattern of genomic biomarkers and correlating regulation of the genomic biomarkers to the likelihood of tumor formation.
  • the methods further include predicting tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, comprising: (a) determining the nucleic acid expression pattern associated with at least one biomarker from a sample comprising at least one biomarker isolated from a biological sample taken from a subject wherein the biomarker comprises a nucleic acid sequence or polypeptide and fragments, variants and isoforms thereof; and (b) correlating an altered level of nucleic acid expression associated with at least one biomarker to a likelihood of tumor formation.
  • Embodiments of the present invention provide methods of predicting lung or liver tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, comprising: (a) determining the nucleic acid expression pattern of sixteen nucleic acid sequences comprising SEQ ID NO:1 (Ugt1a1 , Accession No. NM_201645.2), SEQ ID NO:2 (Ces1 , Accession No. NM_021456.3), SEQ ID NO:3 ( Fgfr2, Accession No. BB220625.2), SEQ ID NO:4 (Ephxi , Accession No. NM_010145.2), SEQ ID NO:5 (Ugt1a2, Accession No.
  • NM_013701.3 SEQ ID NO:6 (AU018778, Accession No. BC013479.1), SEQ ID NO:7 (Gstml , Accession No. NM_010358.4), SEQ ID NO:8 (Ddit4l, Accession No. NM_030143.3), SEQ ID NO:9 (Ikbkg transcript variant 1 , Accession No. NM_010547.1), SEQ ID NO: 10 (Ikbkg transcript variant 2, Accession No. NMJ78590.3), SEQ ID NO:11 (Ugt1a5, Accession No. NM_201643.2), SEQ ID NO:12 (Ugt1a6a, Accession No.
  • N M_145079.3 SEQ ID NO:13 (Ugt1a6b, Accession No. NM_201410.1), SEQ ID NO:14 (Ugt1a7c, Accession No. NM_201642.4), SEQ ID NO: 15 (Ugt1a9, Accession No. NM_201644.2), SEQ ID NO: 16 (Ugt1a10, Accession No. NM_201641.2) and variants and isoforms thereof; and (b) correlating an altered level of nucleic acid expression of the nucleic acid sequences to a likelihood of tumor formation to predict lung or liver tumor formation with at least about 94% accuracy.
  • Embodiments of the present invention further provide methods of assessing a substance for carcinogenic potential, comprising: (a) determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample comprising at least one nucleic acid isolated from a biological sample taken from a subject exposed to a substance to be tested for carcinogenicity wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778) and/or a glutathione S-transferase and variants and isoforms thereof; and (b) correlating an altered level of nucleic acid expression of the at least one nucleic acid sequence to an increased likelihood of tumor formation, wherein an increased likelihood of tumor formation indicates that the substance has carcinogenic potential.
  • Embodiments of the present invention provide methods of using a nucleic acid biomarker to predict tumor formation, comprising determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample comprising a nucleic acid isolated from a biological sample taken from a subject wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID
  • Embodiments of the present invention also provide methods of identifying a biomarker for predicting tumor formation resulting from exposure to a substance, comprising: (a) comparing regulation of a suspected biomarker from a biological sample of (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound, (b) identifying at least one biomarker that is differentially regulated after exposure to the cytotoxic compound compared to the regulation after exposure to the non-cytotoxic compound; and (c) statistically correlating the differential regulation to a likelihood of tumor formation thereby indicating that the at least one suspected biomarker is a biomarker for predicting tumor formation resulting from exposure to a substance.
  • Embodiments of the present invention provide methods of determining a nucleic acid expression profile to predict tumor formation, comprising: (a) performing a microarray analysis on at least one nucleic acid sequence isolated from a biological sample taken from (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound; and (b) statistically analyzing the ability of the expression of at least one nucleic acid sequence to be differentially regulated during cytotoxic and non- cytotoxic treatments, wherein the differential regulation of the at least one nucleic acid sequence establishes a nucleic acid expression profile to predict tumor formation.
  • Embodiments of the present invention further provide kits comprising a probe that hybridizes with a nucleic acid sequence comprising SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO:11 (Ugt1a5), SEQ ID NO:12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a9), SEQ ID NO: 16 (Ugt1a10) and variants and
  • Embodiments of the invention described herein allow identification of carcinogens without performing a standard two-year rodent bioassay thereby forming the basis for a more efficient and economical approach for evaluating the carcinogenic activity of chemicals.
  • Figure 3 Results from the statistical classification analysis for predicting chemically-induced increases in lung tumor incidence using subchronic gene expression biomarkers. Accuracy was estimated based on 10-fold cross- validation and calculated by dividing the number of correct predictions by the total number of predictions.
  • FIG. 4 Real-time RT-PCR confirmation of four genes that showed significant differential expression between lung carcinogenic and noncarcinogenic chemicals and controls ' in the microarray analysis.
  • Ces1 Mm00491334_m1.
  • B Ephxi (Mm00468756_m1).
  • C Acsmi (Mm00519091_m1).
  • D Nqo1 (Mm01253562_m1).
  • RPL32, Mm02528467_g1 ribosomal protein L32
  • Mm02528467_g1 Nqo1
  • Figure 5 Flow chart outlining the statistical classification and cross- validation process used for data analysis and estimating the predictive accuracy of the gene expression and metabolic biomarkers.
  • Figures 6A through 6D Summary of the alterations in gene expression and metabolites following a 90 day exposure to treatments positive (NAPD, BFUR) and negative (NEDD 1 PCNB, CCON, and FCON) for tumors in a two-year rodent cancer bioassay. Chemical details and abbreviations are provided in Table 6. Genes and metabolites in the heat maps were hierarchically clustered to group those showing common changes.
  • Figure 6A Heat map of genes differentially expressed in the lung. Red represents high gene expression and blue is low expression.
  • Figure 6B Heat map of genes differentially expressed in the liver.
  • Red represents high gene expression and blue is low expression.
  • Figure 6C Expression of two potential gene expression biomarkers that showed discriminating expression between carcinogenic chemicals and noncarcinogenic chemicals and controls. Expression of Ces1 was measured in the lung and E130013N09Rik was measured in the liver. Each dot represents an individual animal and the line is the mean expression for that treatment.
  • Figure 6D Heat map of the NMR spectral bins from the serum measurements. Red represents high metabolite concentration and blue is low concentration.
  • Figures 7A and 7B Results from the statistical classification analysis for the gene expression and metabolite biomarkers.
  • Figure 7A Accuracy of the support vector machine statistical classification model with increasing number of genes or NMR spectral bins. Accuracy was estimated based on six-fold cross- validation and calculated by dividing the number of correct predictions by the total number of predictions.
  • Figure 7B Listing of the top 5 gene expression biomarkers in the lung and liver. The listing was based on the Golub score (Golub et al. 1999) ranking.
  • FIG. 8 Real-time RT-PCR confirmation of potential lung gene expression biomarkers that showed discriminating expression between carcinogenic chemicals (NAPD, BFUR) and noncarcinogenic chemicals (NEDD, PCNB) and controls (CCON, FCON).
  • A Ces1 (Mm00491334_m1).
  • B lkbkg (Mm00494927_m1).
  • C Nqo1 (Mm01253562_m1).
  • D 1110032A04Rik (Mm00504963_m1).
  • RPL32, Mm02528467_g1 ribosomal protein L32
  • E130013N09Rik forward primer, 5'- TCCAGGCAAAAAGAAGAGTATCCAA-S' (SEQ ID NO: 17); reverse primer, 5'- CATTTGAACGACTCAGTTAGTCTAACCA-S 1 (SEQ ID NO:18); and probe, 5'- CTGCCACCCATTCATG-3' (SEQ ID NO: 19)).
  • B Ugdh (Mm00447645_m1).
  • C 4922503N01 Rik (MmOO462815_m1).
  • D Gsta1/Gsta2 (Mm00833353_mH).
  • Figure 10 Flow chart outlining the statistical classification and cross- validation process used for data analysis and estimating the predictive accuracy of the gene expression for identifying important biomarkers.
  • Figure 11. Results for predictive accuracy of lung gene expression biomarkers identified for tumor formation.
  • a or “an” or “the” can mean one or more than one.
  • “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted in the alternative ("or”).
  • the term “about,” as used herein when referring to a measurable value such as an amount of a compound or agent of this invention, dose, time, temperature, and the like, is meant to encompass variations of 20%, 10%, 5%, 1%, 0.5%, or even 0.1% of the specified amount.
  • Nucleic acid or “nucleic acid sequence” as used herein encompasses both RNA and DNA, including cDNA, genomic DNA, synthetic (e.g., chemically synthesized) DNA and chimeras of RNA and DNA.
  • the nucleic acid may be double-stranded or single-stranded. Where single-stranded, the nucleic acid may be a sense strand or an antisense strand.
  • the nucleic acid may be synthesized using oligonucleotide analogs or derivatives (e.g., inosine or phosphorothioate nucleotides). Such oligonucleotides can be used, for example, to prepare nucleic acids that have altered base-pairing abilities or increased resistance to nucleases.
  • isolated nucleic acid refers to a nucleic acid separated or substantially free from at least some of the other components of the naturally occurring organism or virus, such as for example, the cell or viral structural components or other polypeptides or nucleic acids commonly found associated with the nucleic acid.
  • an isolated polypeptide means a polypeptide that is separated or substantially free from at least some of the other components of the naturally occurring organism or virus, for example, the cell or viral structural components or other polypeptides or nucleic acids commonly found associated with the polypeptide.
  • the "isolated” polypeptide is at least about 25%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or more pure (w/w).
  • Polypeptide as used herein is used interchangeably with “protein,” and refers to a polymer of amino acids (dipeptide or greater) linked through peptide bonds.
  • polypeptide includes proteins, oligopeptides, protein fragments, protein analogs and the like.
  • polypeptide contemplates polypeptides as defined above that are encoded by nucleic acids, are recombinantly produced, are isolated from an appropriate source, or are synthesized.
  • a “functional" polypeptide is one that retains at least one biological activity normally associated with that polypeptide. According to embodiments of the present invention, a “functional" polypeptide retains all of the activities possessed by the unmodified peptide.
  • polypeptide retains at least about 50%, 60%, 75%, 85%, 90%, 95%, 97%, 98%, 99%, or more, of the biological activity of the native polypeptide (and can even have a higher level of activity than the native polypeptide).
  • a “nonfunctional" polypeptide is one that exhibits essentially no detectable biological activity normally associated with the polypeptide (e.g., at most, only an insignificant amount, e.g., less than about 10% or even 5%).
  • “Fragment” as used herein is one that substantially retains at least one biological activity normally associated with that protein or polypeptide.
  • the “fragment” substantially retains all of the activities possessed by the unmodified protein.
  • substantially retains biological activity, it is meant that the protein retains at least about 50%, 60%, 75%, 85%, 90%, 95%, 97%, 98%, 99%, or more, of the biological activity of the native protein (and can even have a higher level of activity than the native protein).
  • isoform refers to a different form of a protein, regardless of whether it originates from a different gene or splice variant or by modification of a single gene product.
  • isoform refers to a form of a protein that migrates differently from another form of that protein on a two-dimensional gel.
  • Altered level or altered levels refer to an increased level (e.g., a one or two fold increase, or more) or a decreased level (e.g., a one or two- fold decrease, or more) in the expression activity of a nucleic acid sequence and/or quantity of protein resulting from the expression of the nucleic acid sequence in or via a sample, as compared to a level or levels in a corresponding control sample.
  • the sample can be a biological sample taken from a subject.
  • the corresponding sample can be a biological sample taken from a subject not afflicted with a tumor or a sample from a source not known to be derived from a source associated with a tumor.
  • Bio sample refers to any material taken from the body of a subject that may carry the target nucleic acid sequence or polypeptide described herein, including both tissue samples and biological fluids such as blood samples, saliva samples, mucus samples, urine samples, etc.
  • Biomarker refers to any nucleic acid or polypeptide that can be detected, directly or indirectly (e.g., via an analog, metabolite, fragment or breakdown product) in a sample, such as a biological sample from a subject, an increase or decrease of the amount of which, compared to amounts found in similar control samples, such as subjects without disease, is indicative of the presence or risk of tumor formation.
  • the analog, metabolite, fragment or breakdown product of the biomarker may or may not possess all the functional activity of the biomarker.
  • Tuor refers to an abnormal growth of cells or tissues.
  • Tumors can be malignant or benign.
  • malignant tumors include cancerous growth denoted as an uncontrolled growth of tissue that has the potential to spread to adjacent or distant sites of the body.
  • Exemplary tumors include malignant disorders such as breast cancers, osteosarcomas, angiosarcomas, fibrosarcomas and other sarcomas, leukemias, lymphomas, sinus tumors, ovarian, uretal, bladder, prostate and other genitourinary cancers, colon, esophageal and stomach cancers and other gastrointestinal cancers, lung cancers, myelomas, pancreatic cancers, liver cancers, kidney cancers, endocrine cancers, skin cancers, melanomas, angiomas, and brain or central and peripheral nervous (CNS) system tumors, malignant or benign, including gliomas and neuroblastomas.
  • CNS central and peripheral nervous
  • Carcinogenic refers to the ability of a compound to promote tumor growth and/or facilitate propagation of the tumor. As used herein, “carcinogenic” and “tumorigenic” can be used interchangeably.
  • Cytotoxic compound refers to a compound that imparts cellular dysfunction, deterioration and/or cell death. As used herein, a cytotoxic compound can be a carcinogenic, and thus tumorigenic.
  • “Subchronic” as used herein refers to a limited exposure to a chemical to cause an effect compared to a chronic exposure where "chronic” as used herein refers to a more prolonged exposure to the chemical.
  • Subjects as used herein are generally human subjects and includes, but is not limited to, "patients.”
  • the subjects may be male or female and may be of any race or ethnicity, including, but not limited to, Caucasian, African-American, African, Asian, Hispanic, Indian, etc.
  • the subjects may be of any age, including newborn, neonate, infant, child, adolescent, adult, and geriatric.
  • Subjects may also include animal subjects, particularly mammalian subjects such as canines, felines, bovines, caprines, equines, ovines, porcines, rodents (e.g. rats and mice), lagomorphs, primates (including non-human primates), etc., screened for veterinary medicine or pharmaceutical drug development purposes.
  • Subjects include, but are not limited, to those who may have, possess, have been exposed to, or have been previously diagnosed as afflicted with one or more risk factors for lung or liver cancer.
  • Risk factors for lung cancer include, but are not limited to, age, gender, smoking habits and exposure to second-hand smoke, diet, work exposure and family history.
  • Risk factors for liver cancer include, but are not limited to, age, gender, alcohol consumption, hepatitis, cirrhosis, exposure to irritants and family history.
  • methods of predicting tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay include: (a) determining a regulation pattern of at least one biomarker from a sample including at least one biomarker isolated from a biological sample taken from a subject wherein the biomarker comprises a nucleic acid sequence or polypeptide and fragments, variants and isoforms thereof that create a favorable cellular environment for chemically-induced tumor formation; and (b) correlating an altered level of regulation of the at least one biomarker to a likelihood of tumor formation. Regulation can relate to nucleic acid expression and/or detection of protein levels. >•
  • collecting a sample can be carried out either directly or indirectly by any suitable technique.
  • a blood sample from a subject can be carried out by phlebotomy or any other suitable technique, with the blood sample processed further to provide a serum sample or other suitable blood fraction.
  • the biomarker is a metabolic enzyme and/or growth factor receptor.
  • the nucleic acid sequence described above can encode a polypeptide corresponding to a metabolic enzyme and/or growth factor receptor.
  • the metabolic enzymes encoded by the nucleic acids can include those known to be involved in endogenous and xenobiotic metabolic processes and the growth factor receptors can include those known to be involved in tissue and/or organ development.
  • the nucleic acid sequences encode a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a polypeptide encoded by nucleic acid sequence of SEQ ID NO:6 (AU018778), a glutathione S-transferase and fragments, variants and isoforms thereof.
  • nucleic acid sequences encoding these polypeptides and fragments, variants and isoforms thereof further encompass those nucleic acids encoding polypeptides that have at least about 60%, 70%, 80%, 90%, 95%, 97%, 98% or higher amino acid sequence similarity with the polypeptides disclosed herein (or fragments thereof)-
  • sequence identity and/or similarity can be determined using standard techniques known in the art, including, but not limited to, the local sequence identity algorithm of Smith & Waterman, Adv. Appl. Math.
  • the methods include determining the nucleic acid expression pattern of sixteen nucleic acid sequences, wherein the nucleic acid sequences include SEQ ID NO:1 (Ugt1a1 , Accession No. NM_201645.2), SEQ ID NO:2 (Ces1 , Accession No. NM_021456.3), SEQ ID NO:3 ( Fgfr2, Accession No. BB220625.2), SEQ ID NO:4 (Ephxi , Accession No. NM_010145.2), SEQ ID NO:5 (Ugt1a2, Accession No. NM_013701.3), SEQ ID NO:6 (AU018778, Accession No.
  • the methods according to the present invention further include measuring the levels of RNA, e.g. mRNA, and levels of proteins. Such measurements can be made according to methods well known in the art as discussed above and as provided in the examples below. See, e.g., SAMBROOK ef a/., MOLECULAR CLONING: A LABORATORY MANUAL 2nd Ed. (Cold Spring Harbor, NY, 1989); F. M. AUSUBEL ef a/. CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (Green Publishing Associates, Inc. and John Wiley & Sons, Inc., New York).
  • determining the presence of an altered level of a biomarker in the sample can also be carried out either directly or indirectly in accordance with known techniques, including, but not limited to, mass spectrometry, chromatography, electrophoresis, sedimentation, isoelectric focusing, and antibody assay. See, e.g., U.S. Patent No. 6,589,748; U.S. Patent No. 6,027,896.
  • biomarkers may be identified by two-dimensional electrophoresis (2- D electrophoresis).
  • 2D-electrophoresis is a technique comprising denaturing electrophoresis, followed by isoelectric focusing; this generates a two-dimensional gel (2D gel) containing a plurality of separated proteins.
  • increased level it is meant (a) any level of a biomarker when that biomarker is not present in a normal subject without tumor formation, as well as (b) an elevated level (e.g., a two or three-fold increase in detected quantity) of a biomarker or a particular fragment, variant or isoform of a biomarker when that biomarker or a particular fragment, variant or isoform is present in a normal subject without tumor formation.
  • depression level it is meant (a) an absence of a particular biomarker or fragment, variant or isoform of a particular biomarker when that biomarker is present in a normal subject without tumor formation, as well as (b) a reduced level (e.g., a two or three-fold reduction in detected quantity) of a biomarker or fragment, variant or isoform of a biomarker when that biomarker or fragment, variant or isoform is present in a normal subject without tumor formation.
  • a reduced level e.g., a two or three-fold reduction in detected quantity
  • the steps of (a) assaying a sample for an elevated level of a biomarker and/or depressed level of a biomarker, and (b) correlating an elevated level of a biomarker and/or a depressed level of a biomarker in said sample associated with tumor formation can be carried out in accordance with known techniques or variations thereof that will be apparent to persons skilled in the art.
  • antibody assays used in some approaches may, in general, be homogeneous assays or heterogeneous assays. In a homogeneous assay the immunological reaction usually involves the specific antibody, a labeled analyte, and the sample of interest.
  • the signal arising from the label is modified, directly or indirectly, upon the binding of the antibody to the labeled analyte. Both the immunological reaction and detection of the extent thereof are carried out in a homogeneous solution. Immunochemical labels which may be employed include free radicals, radioisotopes, fluorescent dyes, enzymes, bacteriophages, coenzymes, and so forth. In a heterogeneous assay approach, the reagents are usually the specimen, the antibody of the invention and a system or means for producing a detectable signal. Similar specimens as described above may be used. The antibody is generally immobilized on a support, such as a bead, plate or slide, and contacted with the specimen suspected of containing the antigen in a liquid phase.
  • the support is then separated from the liquid phase and either the support phase or the liquid phase is examined for a detectable signal employing means for producing such signal.
  • the signal is related to the presence of the analyte in the specimen.
  • Means for producing a detectable signal include the use of radioactive labels, fluorescent labels, enzyme labels, and so forth.
  • an antibody which binds to that site can be conjugated to a detectable group and added to the liquid phase reaction solution before the separation step.
  • the presence of the detectable group on the solid support indicates the presence of the antigen in the test sample.
  • suitable immunoassays are the radioimmunoassay, immunofluorescence methods, enzyme-linked immunoassays, and the like.
  • the methods described herein are applicable to predicting tumor formation at any organ site.
  • the organ site can include, but is not limited to, liver, lung, kidney, mammary, and hematopoietic sites.
  • Tumors include, but are not limited to, the tumors described above, and in some embodiments, breast cancer, osteosarcoma, angiosarcoma, fibrosarcoma, leukemia, sinus tumor, ovarian cancer, uretal cancer, bladder cancer, prostate cancer, genitourinary cancer, gastrointestinal cancer, lung cancer, lymphoma, myeloma, pancreatic cancer, liver cancer, kidney cancer, endocrine cancer, skin cancer, melanoma, angioma and brain or central nervous system (CNS) cancer.
  • the tumor is associated with the lung or liver.
  • nucleic expression as described herein can be tissue specific in particular embodiments of the present invention.
  • the methods of predicting tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay provide at least about 90% accuracy of predicting tumor formation. In some embodiments, the methods provide at least about 93% accuracy, 90% sensitivity and 90% specificity of predicting tumor formation.
  • the present invention provides methods of predicting lung or liver tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, comprising: (a) determining the nucleic acid expression pattern of eight nucleic acid sequences comprising SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO: 10 (Ikbkg transcript variant 2), SEQ ID NO: 11 (Ugt1a5), SEQ ID NO: 12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c),
  • Embodiments of the present invention further provide methods of assessing a substance for carcinogenic potential, including: (a) determining the regulatory pattern of a biomarker from a sample including at least one biomarker isolated from a biological sample taken from a subject exposed to a substance to be tested for carcinogenicity; and (b) correlating an altered level of regulation of the at least one biomarker to an increased likelihood of tumor formation, wherein an increased likelihood of tumor formation indicates that the substance has carcinogenic potential.
  • the biomarker is a nucleic acid sequence that encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778) and/or a glutathione S-transferase and variants and isoforms thereof.
  • the method includes determining the nucleic acid expression pattern of eight nucleic acid sequences, wherein the nucleic acid sequences include SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO:11 (Ugt1a5), SEQ ID NO:12 (Ugtia ⁇ a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO: 15 (Ugt1a9), SEQ ID NO: 16 (Ugt1a1), SEQ
  • the substance can be a chemical, biological and/or physical agent.
  • the substance can include, but is not limited to, a pharmaceutical product, food additive, pesticide or cleaning product.
  • the substances further include commercial, industrial, residential and environmental chemicals.
  • exposure to the substance is a subchronic exposure.
  • exposure to the substance to be tested is for a period of about two years.
  • exposure to the substance is less than about two years, less than about one year, less than about six months, less than about 4 months or about or less than about 3 months.
  • the exposure is days. In particular embodiments, the exposure is about or less than about 90 days.
  • Embodiments of the present invention provide methods of using a nucleic acid biomarker to predict tumor formation, including: determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample including a nucleic acid isolated from a biological sample taken from a subject wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778), a glutathione S-transferase and variants and isoforms thereof.
  • a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778)
  • FIG. 1 For embodiments of the present invention, embodiments of the present invention include methods of identifying a biomarker for predicting tumor formation resulting from exposure to a substance, including: (a) comparing regulation of a suspected biomarker from a biological sample of (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound, (b) identifying at least one biomarker that is differentially regulated after exposure to the cytotoxic compound compared to the regulation after exposure to the non- cytotoxic compound; and (c) statistically correlating the differential regulation to a likelihood of tumor formation thereby indicating that the at least one suspected biomarker is a biomarker for predicting tumor formation resulting from exposure to a substance.
  • “Differential regulation” as used herein can refer to altered nucleic acid expression and/or altered protein levels. Observing altered levels of nucleic acid expression can be used to identify a nucleic acid biomarker, and observing altered protein levels can be used to identify a protein biomarker. Thus, upon detection of the differentially regulated nucleic acid expression, detection of the quantity of protein resulting therefrom can serve as a biomarker for predicting tumor formation.
  • cytotoxic compounds as understood by those skilled in the art, .. generally, any compound can be "cytotoxic” under certain conditions. Accordingly, in embodiments of the present invention, a “non-cytotoxic" compound is one that serves as a control compound for purposes of comparing effects of exposure to a cytotoxic compound as described herein.
  • Embodiments of the present invention further provide methods of determining a nucleic acid expression profile to predict tumor formation, including: (a) performing a microarray analysis on at least one nucleic acid sequence isolated from a biological sample taken from (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound; and (b) statistically analyzing the ability of the expression of at least one nucleic acid sequence to be differentially regulated during cytotoxic and non- cytotoxic treatments, wherein the differential regulation of the at least one nucleic acid sequence establishes a nucleic acid expression profile to predict tumor formation.
  • Detecting differential regulation can include measurement of changes in nucleic acid expression and/or measurement of changes in protein levels of proteins expressed by the nucleic acid sequences.
  • the statistical methods used to derive embodiments of the present invention include the Golub algorithm (Golub et al. 1999) for feature selection and a support vector machine model for classification analysis. The predictive accuracy of the statistical classification analysis was assessed using N- fold cross-validation. Different numbers of genes were evaluated in the feature selection process to assess the change in predictive accuracy with gene number. The classification analysis was performed using the PCP software program (Buturovic 2006).
  • kits including probes that hybridize with at least one biomarker to predict tumor formation.
  • the kits include a probe that hybridizes with a nucleic acid sequence including SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO: 11 (Ugt1a5), SEQ ID NO:12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a1), SEQ ID NO:2 (
  • kits can include at least one biochemical material and/or reagent, such as buffers and/or binding partners, that are capable of specifically binding with one or more of the biomarkers described herein.
  • Such materials and/or reagents can provide a means for determining binding between the biochemical material and one or more biomarkers, whereby at least one analysis to determine a presence of one or more biomarkers is carried out on a sample.
  • analysis or analyses may be carried out with the additional use of detection devices for immunoassay, chromatography, spectrometry, electrophoresis, sedimentation, isoelectric focusing, or any combination thereof. Analysis may be carried out on a single sample or multiple samples.
  • kits can include instructions for performing the method or assay. In some embodiments, the kits can further include instructions for predicting tumor formation resulting from exposure to a substance. The kits can include instructions for evaluating a chemical for carcinogenic potential. Further, the kits can include instructions for assaying a biological sample for the presence of tumor formation biomarkers. In addition, the kits may optionally comprise depictions or photographs that represent the appearance of positive and negative results. In some embodiments, the components of the kits may be packaged together in a common container.
  • NTP National Toxicology Program
  • mice 333-41-5; Purity: 98%), and malathion (MALA; CAS No. 121-75-5; Purity: 95%) were purchased from Advanced Technology and Industry (Hong Kong, China).
  • Animals and Treatment One-hundred and fifty female B6C3F1 mice were obtained from Charles River Laboratories (Raleigh, NC).
  • Female B6C3F1 mice were chosen since they represent the most sensitive model for chemically- induced lung tumor formation in the NTP rodent bioassay. Upon receipt, the mice were randomized by weight and divided into treatment groups (Table 1). The 13 chemicals used in this study have been previously tested by the NTP. Seven of the chemicals were positive for an increased incidence of primary alveolar/bronchiolar adenomas or carcinomas and six were negative.
  • mice Animal use in this study was approved by the Institutional Animal Use and Care Committee of The Hamner Institutes (formerly CUT Centers for Health Research) and was conducted in accordance with the National Institutes of Health guidelines for the care and use of laboratory animals. Animals were housed in fully-accredited American Association for Accreditation of Laboratory Animal Care (AAALAC) facilities. Following 13 weeks of exposure, the mice were euthanized with a lethal i.p. dose of sodium pentobarbital (Abbott Laboratories, Chicago, IL). The four right lung lobes were isolated by suturing, removed, and minced together in RNA/aferTM (Ambion, Austin, TX).
  • RNA/aferTM RNA/aferTM
  • the left lung lobe was inflated with 10% neutral buffered formalin and stored in 10% formalin for further processing. Following a standard fixation period, the lung tissues were embedded into paraffin blocks, sectioned at 5 ⁇ m, and stained with hematoxylin and eosin. Histological changes were assessed by an accredited pathologist.
  • Double-stranded cDNA was synthesized from 5 ⁇ g of total RNA using the One- Cycle cDNA synthesis kit (Affymetrix, Santa Clara, CA). Biotin-labeled cRNA was transcribed from the cDNA using the GeneChip IVT Labeling Kit (Affymetrix). Fifteen ⁇ g of labeled cRNA was fragmented and hybridized to Affymetrix Mouse Genome 430 2.0 arrays for 16 h at 45°C. The hybridized arrays were washed using the GeneChip Fluidics Station 450 and scanned using a GeneChip 3000 scanner. Microarray data were processed using RMA with a Iog 2 transformation (Irizarry et al. 2003). The gene expression results have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (Accession No.: GSE6116).
  • SMILES molecular input line entry specification
  • the data for the animals in the test set was set aside as if never observed them. Feature selection was then performed on the training set using the Golub algorithm (Golub et al. 1999) and the genes with the largest Golub statistic were used to build a support vector machine classification model. The model was then used to predict the classes for the seven animals in the test set that were held out at the beginning of the process. The cross-validation process was repeated at least 100 times to obtain a good estimate of the predictive accuracy. Accuracy was calculated by dividing the number of correct predictions in the test set by the total number of predictions. Different numbers of genes were evaluated in the feature selection process to assess the change in predictive accuracy with gene number. The classification analysis was performed using the PCP software program (Buturovic 2006). Results
  • the 13 chemical treatments in the study were intentionally chosen to be diverse in terms of chemical structure, genotoxicity, and potential modes-of-action.
  • the structural diversity among the chemicals was analyzed using a Tanimoto similarity coefficient with a coefficient of 1.0 being identical molecules and 0.0 having no structural similarity.
  • the average similarity among all 13 chemicals in the study was 0.141 with a maximum similarity of 0.508 between NEDD and NAPD (Table 3).
  • the average similarity dropped to 0.123 with a maximum similarity of 0.327 between DBET and BBMP.
  • the average similarity for all single chemicals tested by the NTP in a rodent cancer bioassay was 0.155.
  • the top gene expression biomarkers were changes in the UDP-glucuronosyltransferase 1a (Ugtia) family, carboxylesterase 1 (Ces1), fibroblast growth factor receptor 2 (Fgfr2), epoxide hydrolase 1 (Ephxi), glutathione S-transferase ⁇ 1 (Gstml), and an unannotated gene (Table 5).
  • Ugtia UDP-glucuronosyltransferase 1a
  • Ces1 carboxylesterase 1
  • Fgfr2 fibroblast growth factor receptor 2
  • Ephxi epoxide hydrolase 1
  • Gstml glutathione S-transferase ⁇ 1
  • Table 9 A complete ranking is provided as supplemental material (Table 9).
  • the three corresponding probe sets were not specific for a particular isoform.
  • the Ugtia isoforms are produced through the alternative splicing of variable exon
  • biomarkers that predict an increase in tumor incidence is fundamentally different than biomarkers that predict tumor formation in an individual animal.
  • the biomarkers that were identified in this study were likely to be genes that created a favorable cellular environment for chemically-induced lung tumor formation and not those that determined whether a specific animal gets tumors.
  • genes in the predictive signature most were enzymes involved in endogenous and xenobiotic metabolic processes and one was a growth factor receptor involved in lung development. The functional breakdown of these predictive biomarkers was consistent with the established role of metabolism and growth factor signaling in tumorigenesis.
  • Ugtia is one of a family of enzymes that catalyze the glucuronidation of endogenous and xenobiotic molecules (Tukey and Strassburg 2000).
  • the mouse Ugt1 locus produces nine different genes through the alternative splicing of 14 variable exons to four constant exons (Zhang et al. 2004).
  • Genome-wide scans have identified the Ugtia locus as playing an important role in chemical carcinogenesis (Tukey and Strassburg 2000) and various isoforms have been shown to be differentially expressed in human liver cancer (Strassburg et al. 1997).
  • Ces1 is part of a large multigene family of enzymes that hydrolyze ester and amide bonds and play a role in cellular cholesterol esterification (Ghosh 2000; Uphoff and Drexler 2000). Previous studies have suggested that Ces1 may play a role in detoxifying ester or amide containing xenobiotics in the lung (Munger et al. 1991 ; Uphoff and Drexler 2000).
  • human CES1 was part of an 11 gene transcriptional signature that was used to predict therapy outcome and malignancy for multiple types of human cancer including lung cancer (Glinsky et al. 2005). In contrast to our studies, the downregulation of human CES1 was considered prognostic (Glinsky et al. 2005). However, the transcriptional signature in their study was applied to relatively late stage tumors and not as early classifier of carcinogenic potential.
  • Ephxi The next metabolic enzyme in the predictive set was Ephxi .
  • Ephxi has been shown to play a role in the activation and detoxification of many polyaromatic hydrocarbons (Arand et al. 2005).
  • human cancer one study has noted an increased expression of human EPHX1 in hepatocellular carcinomas and variable expression in lung tumors (Coller et al. 2001).
  • a separate study has identified increased expression of EPHX1 in human glioblastomas (Kessler et al. 2000).
  • the increased expression in human liver cancer is supported by rodent studies where expression of Ephxi was increased in preneoplastic nodules (Griffin and Gengozian 1984; Novikoff et al. 1979).
  • the fourth most predictive metabolic enzyme was the relative uncharacterized AU018778 gene.
  • the amino acid sequence of the AU018778 gene showed significant similarity to carboxylesterases with approximately 65% identity with mouse Ces1.
  • the gene On the genomic level, the gene is found in a cluster of esterases downstream of Ces1 and upstream of Es22 and Ces3.
  • AU018778 In normal tissue, AU018778 is predominantly expressed in kidney, liver, intestine, and adipose tissue (Su et al. 2004). No reports were found that showed an altered expression in cancer.
  • Gstmi The last metabolic enzyme in the predictive set was Gstmi .
  • Gstmi is part of a family of glutathione transferases that are involved in the metabolism of endogenous and xenobiotic molecules and can modulate cell signaling through a variety of mechanisms (Hayes et al. 2005). Although the majority of work on Gstmi in cancer has focused on associating human polymorphic differences with susceptibility, increased expression of GSTM 1 has been identified as a potential biomarker in human head and neck tumors (Bongers et al. 1995). In the lung, a previous study has reported that human GSTM1 was infrequently expressed in normal tissue and its expression was not increased in lung tumors (Spivack et al. 2003). In rodent studies, increased expression of mu class glutathione transferases have been observed in preneoplastic nodules in the rat liver (Hayes and Pulford 1995), but not in the mouse liver (Hatayama et al. 1993).
  • Fgfr2 is part of a family of receptor tyrosine kinases that bind fibroblast growth factors and initiate cellular signals that affect proliferation and differentiation (Eswarakumar et al. 2005). Alternative splicing of Fgfr2 results in two different isoforms, Fgfr2b and Fgfr2c, that have different ligand binding affinities (Eswarakumar et al. 2005). The targeted disruption of the Fgfr2b isoform in mice results in abnormal development of the lung, pituitary, thyroid, teeth, and limbs (De Moerlooze et al.
  • Fgfr2b plays a significant role in lung development (De Langhe et al. 2006; del Moral et al. 2006).
  • One study has reported that binding of Fgf9 to Fgfr2b cooperates with Shh signaling to regulate mesenchymal proliferation in lung development (White et al. 2006).
  • expression of Shh was also found to be one of the top 20 predictive biomarkers in our study (Table 8).
  • expression of Fgfr2 has shown different behaviors depending on tissue and cell type.
  • Fgfr2b In human lung and colorectal cancer, increased expression of Fgfr2b was observed in cancer tissue (Watanabe et al. 2000; Yamayoshi et al. 2004) while in human gastric and bladder cancer, decreased expression of Fgfr2b was observed in cancer cells and was associated with poor patient prognosis (Diez de Medina et al. 1997; Matsunobu et al. 2006). In this study, decreased expression was predictive of lung tumor formation.
  • Transcriptomic and metabonomic technologies for discovering biomarkers that can efficiently and economically identify chemical carcinogens without performing a standard two-year rodent bioassay were compared.
  • the objectives of this study were to (1) compare transcriptomic and metabonomic technologies for their ability to identify predictive biomarkers related to these chemicals; and (2) demonstrate that biomarkers collected following a subchronic exposure to a chemical have the potential to predict liver and lung tumor formation observed in a two-year rodent bioassay.
  • mice were randomized by weight and divided into 6 treatment groups (Table 6). Animal treatment was initiated at 5 weeks of age. Mice were housed 5 per cage in polycarbonate cages in a temperature and humidity controlled environment with standard 12 h light/dark cycle. All animals were given access to food (NIH-07 ground meal; Harlan Teklad; Madison, Wl) and water ad libitum. Animal use in this study was approved by International Animal Use and Care Committee of CIIT Centers for Health Research and was conducted in accordance with the National Institutes of Health guidelines for the care and use of laboratory animals. Animals were housed in fully-accredited American Association for Accreditation of Laboratory Animal Care (AAALAC) facilities.
  • AAAALAC American Association for Accreditation of Laboratory Animal Care
  • PCNB Pentachloronitrobenzene
  • NEDD N-(1-naphthyl)ethylenediamine dihydrochloride
  • NAPD 1 ,5-naphthalenediamine
  • PCNB Pentachloronitrobenzene
  • NEDD N-(1-naphthyl)ethylenediamine dihydrochloride
  • NAPD 1 ,5-naphthalenediamine
  • PCNB Pentachloronitrobenzene
  • NEDD N-(1-naphthyl)ethylenediamine dihydrochloride
  • NAPD 1 ,5-naphthalenediamine
  • mice were anesthetized with a lethal Lp. dose of sodium pentobarbital (Abbott Laboratories, Chicago, IL). Blood was drawn by cardiac puncture, placed in a serum separator Microtainer® tube (Benton Dickinson, Franklin Lakes, NJ), and the serum isolated by centrifugation. The four right lung lobes were isolated by suturing, removed, and minced together in RNA/aterTM (Ambion, Austin, TX). The left lung lobe was inflated with 10% neutral buffered formalin and stored in 10% formalin. The right, caudate and median liver lobes were minced in RNA/aterTM. The left liver lobe was removed and placed in 10% formalin.
  • Double-stranded cDNA was synthesized using the One-Cycle cDNA synthesis kit (Affymetrix, Santa Clara, CA) and biotin-labeled cRNA was transcribed using the GeneChip IVT Labeling Kit (Affymetrix). Labeled cRNA was fragmented and hybridized to Affymetrix Mouse Genome 430 2.0 arrays. Microarray data were processed using RMA with . a log 2 transformation (Irizarry et al. 2003). The gene expression results have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (Accession No.: GSE5127 and GSE5128).
  • Serum NMR Analysis was performed on the serum from 3 animals per treatment group. A total of 18 animals were analyzed. NMR samples were prepared by diluting serum samples to a final volume of 600 ⁇ l with a solution of D 2 O, containing 2,2-dimethyl-2-silapentane-5-sulfonate sodium salt (5 mM final) and sodium azide (0.02% w/v final). The 1 H spectra were obtained at 399.80Mz on a Varian Inova 400MHz NMR spectrometer using a Varian 5 mm pulsed field gradient, inverse detection probe.
  • the spectra were acquired with 256 scans, using a 2 second solvent presaturation period and a 200 ms CPMG filter to reduce the signals from the protein and lipid components. The total recycle time for each scan was 4.8 seconds. Spectral interpretation was aided by two-dimensional 1 H- 13 C gHSQC correlation experiments on selected samples. Data were processed using ACD software (Advanced Chemistry Development, Toronto, Ontario). A 0.1 Hz exponential line broadening was applied to the data. The spectra were phased, baseline corrected, integrated using the ACD intelligent binning protocol, and normalized based on total bin area. The region around the residual water signal from 4.6 to 6 ppm was excluded from the analysis. To avoid inclusion of toxicant or exogenous metabolite peaks, the entire region above 7.0 ppm was excluded, as well as peaks associated with pentobarbital, propylene glycol and lactate.
  • Golub algorithm Golub et al. 1999
  • the cross-validation process is outlined in Figure 5 and consisted of first randomly dividing all 18 animals into six equally sized groups (i.e., three animals per group). Five of the groups were then lumped together to use as a training set (15 animals) and the remaining group was used as the test set (3 animals).
  • the data for the animals in the test set was set aside as if we had never observed them.
  • Feature selection was then performed on the training set using the Golub algorithm (Golub et al. 1999) and the genes or NMR spectral bins with the largest Golub statistic were used to build a support vector machine classification model.
  • the model was then used to predict the classes for the three animals in the test set that were held out at the beginning of the process.
  • the cross-validation process was repeated 100 times to obtain a good estimate of the predictive accuracy. Accuracy was calculated by dividing the number of correct predictions in the test set by the total number of predictions. Different numbers of genes were evaluated in the feature selection process to assess the change in predictive accuracy with gene number.
  • the classification analysis was performed using the PCP software program (Buturovic 2006).
  • Gstml and Ephxi xenobiotic metabolizing enzymes
  • Ces1 cholesterol estehfication
  • Ikbkg a key kinase involved in NFKB signaling
  • Acsmi a gene involved in the degradation of medium-chain fatty acids
  • the statistical classification analysis of the NMR spectral bins showed relatively low predictive accuracy with few metabolites in the model and increasing accuracy as more bins were added. With all bins in the model, the predictive accuracy was 94% with a sensitivity and specificity of 100% and 83%, respectively. Efforts were made to remove all chemical specific metabolites so that only changes in the endogenous metabolites were used in the analysis. These results suggest that individual endogenous metabolites make relatively poor biomarkers, but the metabolite profile as a whole is altered following carcinogenic treatment and may accurately predict the two-year bioassay results. Given the chemicals used in this study produce both lung and liver tumors, it is unknown what changes in the serum metabolite profile are attributed to each target organ.
  • the primary purpose of this study was to compare and contrast transcriptomic and metabonomic technologies for identifying biomarkers that can predict a two year rodent cancer bioassay.
  • the results of the study demonstrate that both transcriptional and metabonomic biomarkers collected following a subchronic exposure to a chemical have the potential to predict liver and lung tumor formation observed in a two-year rodent bioassay.
  • the gene expression biomarkers appear to be more accurate than the serum metabolite markers.
  • mice Animal exposures for each chemical were performed via the route and dose listed in Table 9. 5-6 week old female B6C3F1 mice were exposed for 13 weeks. Following 13 weeks of exposure, the mice were euthanized, histopathology on the left lung lobes and left liver lobes were assessed and RNA isolated from the right lung lobes and right, caudate and median liver lobes for microarray analysis. Microarray analysis was performed as described in Example 1 on 3 to 4 animals using Affymetrix 430 2.0 arrays.
  • Table 15 lists the top lung gene expression biomarkers identified.
  • Figure 12 depicts the predictive accuracy of liver gene biomarkers.
  • Figure 13 depicts the most discriminating liver gene expression biomarkers.
  • Figure 14 depicts the predictive accuracy of liver gene expression biomarkers using various classification algorithms.
  • gene expression biomarkers collected following a subchronic exposure can predict increased tumor incidence in a two-year bioassay with reasonable accuracy using the analysis methods described herein.
  • Epoxide hydrolases structure, function, mechanism, and assay. Methods Enzymol 400, 569-88.
  • PCP a program for supervised classification of gene expression profiles. Bioinformatics 22, 245-7.
  • DAVID Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4, P3. Diez de Medina, S. G., Chopin, D., El Marjou, A., Delouvee, A., LaRochelle, W. J., Hoznek, A., Abbou, C 1 Aaronson, S. A., Thiery, J. P., and Radvanyi, F. (1997). Decreased expression of keratinocyte growth factor receptor in a subset of human transitional cell bladder carcinomas. Oncogene 14, 323-30.
  • Epoxide hydrolase a marker for experimental hepatocarcinogenesis. Ann CHn Lab Sci 14, 27-31. Hasegawa, R., and Ito, N. (1994). Hepatocarcinogenesis in the rat. In Carcinogenesis (M. P. Waalkes and J. M. Ward, eds.). Raven Press, New York.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides methods of predicting tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay including determining a nucleic acid expression pattern of genomic biomarkers and correlating regulation of the genomic biomarkers to the likelihood of tumor formation. Kits including the genomic biomarkers are also provided.

Description

Methods of Using Genomic Biomarkers to Predict Tumor Formation
Related Application Data
This application claims the benefit of U.S. Patent Application Serial No. 60/906,099, filed March 8, 2007, the contents of which are incorporated herein by reference in its entirety.
Field of the Invention
The present invention relates to biomarkers and methods of using the same in predicting tumor formation and identifying carcinogenic substances.
Background of the Invention
The two-year rodent bioassay is widely used to assess the carcinogenic potential of chemical, biological and physical agents. Current regulatory standards require select agents to be tested for carcinogenic activity prior to commercial release, including pharmaceuticals, food additives, and pesticides. In the interest of public safety, other important commercial, industrial, or environmental chemicals are tested by the federal government through the National Toxicology Program (NTP) to identify potential hazards and generate limited information on dose-response behavior for chemical risk assessments.
For both the business and risk assessment communities, there is significant motivation for developing efficient and economical methods to identify carcinogenic chemicals. From a business perspective, each bioassay requires hundreds of animals and about $2 to $4 million per chemical (NTP 1996). Typically, the bioassays are performed late in the developmental pipeline after commitment of substantial resources in product development. A positive result can delay release of the product until the potential carcinogenic risks can be addressed through further study, or may even result in discontinuation of the product. Thus, identifying potential carcinogens earlier in the development pipeline could provide substantial monetary savings. From a risk assessment perspective, there are approximately 80,000 chemicals registered for commercial use in the United States and 2,000 more added each year (NTP 2001). Since most have not been tested for carcinogenic activity, a more economical method to identify potential carcinogens would allow more chemicals to be tested for long- term health effects prior to human exposure.
The utility of applying transcriptomic and metabonomic technologies to identify biomarkers associated with toxicological endpoints has been the subject of considerable research. Thus far, most toxicology studies employing these technologies have focused on identifying biomarkers associated with relatively acute endpoints, such as hepatotoxicity and nephrotoxicity (Amin et al. 2004;
Fielden et al. 2005; Nicholls et al. 2001 ; Thomas et al. 2001 ; Waring et al. 2001).
Fewer studies have utilized these technologies to identify biomarkers that predict chronic endpoints, such as cancer (Ellinger-Ziegelbauer et al. 2005; Kramer et al.
2004; Nie et al. 2006).
Summary of the Invention
The present invention provides an alternative to the standard rodent cancer bioassay to identify substances such as chemical, biological and physical agents, for the potential to cause adverse effects to humans and animals. Further, the present invention provides methods of using biomarkers to predict the carcinogenic activity of a substance. The biomarkers of the present invention can discriminate between carcinogenic and non-carcinogenic treatments. Embodiments of the present invention provide methods of predicting tumor formation including determining a nucleic acid expression pattern of genomic biomarkers and correlating regulation of the genomic biomarkers to the likelihood of tumor formation. The methods further include predicting tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, comprising: (a) determining the nucleic acid expression pattern associated with at least one biomarker from a sample comprising at least one biomarker isolated from a biological sample taken from a subject wherein the biomarker comprises a nucleic acid sequence or polypeptide and fragments, variants and isoforms thereof; and (b) correlating an altered level of nucleic acid expression associated with at least one biomarker to a likelihood of tumor formation.
Embodiments of the present invention provide methods of predicting lung or liver tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, comprising: (a) determining the nucleic acid expression pattern of sixteen nucleic acid sequences comprising SEQ ID NO:1 (Ugt1a1 , Accession No. NM_201645.2), SEQ ID NO:2 (Ces1 , Accession No. NM_021456.3), SEQ ID NO:3 ( Fgfr2, Accession No. BB220625.2), SEQ ID NO:4 (Ephxi , Accession No. NM_010145.2), SEQ ID NO:5 (Ugt1a2, Accession No. NM_013701.3), SEQ ID NO:6 (AU018778, Accession No. BC013479.1), SEQ ID NO:7 (Gstml , Accession No. NM_010358.4), SEQ ID NO:8 (Ddit4l, Accession No. NM_030143.3), SEQ ID NO:9 (Ikbkg transcript variant 1 , Accession No. NM_010547.1), SEQ ID NO: 10 (Ikbkg transcript variant 2, Accession No. NMJ78590.3), SEQ ID NO:11 (Ugt1a5, Accession No. NM_201643.2), SEQ ID NO:12 (Ugt1a6a, Accession No. N M_145079.3), SEQ ID NO:13 (Ugt1a6b, Accession No. NM_201410.1), SEQ ID NO:14 (Ugt1a7c, Accession No. NM_201642.4), SEQ ID NO: 15 (Ugt1a9, Accession No. NM_201644.2), SEQ ID NO: 16 (Ugt1a10, Accession No. NM_201641.2) and variants and isoforms thereof; and (b) correlating an altered level of nucleic acid expression of the nucleic acid sequences to a likelihood of tumor formation to predict lung or liver tumor formation with at least about 94% accuracy.
Embodiments of the present invention further provide methods of assessing a substance for carcinogenic potential, comprising: (a) determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample comprising at least one nucleic acid isolated from a biological sample taken from a subject exposed to a substance to be tested for carcinogenicity wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778) and/or a glutathione S-transferase and variants and isoforms thereof; and (b) correlating an altered level of nucleic acid expression of the at least one nucleic acid sequence to an increased likelihood of tumor formation, wherein an increased likelihood of tumor formation indicates that the substance has carcinogenic potential. Embodiments of the present invention provide methods of using a nucleic acid biomarker to predict tumor formation, comprising determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample comprising a nucleic acid isolated from a biological sample taken from a subject wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID
NO:6 (AU018778), a glutathione S-transferase and variants and isoforms thereof.
Embodiments of the present invention also provide methods of identifying a biomarker for predicting tumor formation resulting from exposure to a substance, comprising: (a) comparing regulation of a suspected biomarker from a biological sample of (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound, (b) identifying at least one biomarker that is differentially regulated after exposure to the cytotoxic compound compared to the regulation after exposure to the non-cytotoxic compound; and (c) statistically correlating the differential regulation to a likelihood of tumor formation thereby indicating that the at least one suspected biomarker is a biomarker for predicting tumor formation resulting from exposure to a substance.
Embodiments of the present invention provide methods of determining a nucleic acid expression profile to predict tumor formation, comprising: (a) performing a microarray analysis on at least one nucleic acid sequence isolated from a biological sample taken from (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound; and (b) statistically analyzing the ability of the expression of at least one nucleic acid sequence to be differentially regulated during cytotoxic and non- cytotoxic treatments, wherein the differential regulation of the at least one nucleic acid sequence establishes a nucleic acid expression profile to predict tumor formation.
Embodiments of the present invention further provide kits comprising a probe that hybridizes with a nucleic acid sequence comprising SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO:11 (Ugt1a5), SEQ ID NO:12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a9), SEQ ID NO: 16 (Ugt1a10) and variants and isoforms thereof, under conditions whereby nucleic acid hybridization can occur.
Embodiments of the invention described herein allow identification of carcinogens without performing a standard two-year rodent bioassay thereby forming the basis for a more efficient and economical approach for evaluating the carcinogenic activity of chemicals.
Brief Description of the Drawings Figure 1. Flow chart presenting the experimental design of process to identify biomarkers that are predictive of tumor formation in a two-year rodent bioassay.
Figure 2. Heat map of genes differentially expressed in the lung following a 90-day exposure to lung carcinogenic and noncarcinogenic treatments. Chemical details, abbreviations, and NTP study results are provided in Tables 1 and 2. Red represents high gene expression and blue is low expression.
Figure 3. Results from the statistical classification analysis for predicting chemically-induced increases in lung tumor incidence using subchronic gene expression biomarkers. Accuracy was estimated based on 10-fold cross- validation and calculated by dividing the number of correct predictions by the total number of predictions.
Figure 4. Real-time RT-PCR confirmation of four genes that showed significant differential expression between lung carcinogenic and noncarcinogenic chemicals and controls' in the microarray analysis. (A) Ces1 (Mm00491334_m1). (B) Ephxi (Mm00468756_m1). (C) Acsmi (Mm00519091_m1). (D) Nqo1 (Mm01253562_m1). The data were normalized to the expression of ribosomal protein L32 (RPL32, Mm02528467_g1) and their corresponding vehicle control using the ΔΔCt method. Each bar represents the mean and standard error of three to four different animals. Statistical analysis was performed using two sample t-tests between lung carcinogenic and noncarcinogenic chemicals and controls. *, p < 0.05.
Figure 5. Flow chart outlining the statistical classification and cross- validation process used for data analysis and estimating the predictive accuracy of the gene expression and metabolic biomarkers. Figures 6A through 6D. Summary of the alterations in gene expression and metabolites following a 90 day exposure to treatments positive (NAPD, BFUR) and negative (NEDD1 PCNB, CCON, and FCON) for tumors in a two-year rodent cancer bioassay. Chemical details and abbreviations are provided in Table 6. Genes and metabolites in the heat maps were hierarchically clustered to group those showing common changes. Figure 6A. Heat map of genes differentially expressed in the lung. Red represents high gene expression and blue is low expression. Figure 6B. Heat map of genes differentially expressed in the liver. Red represents high gene expression and blue is low expression. Figure 6C. Expression of two potential gene expression biomarkers that showed discriminating expression between carcinogenic chemicals and noncarcinogenic chemicals and controls. Expression of Ces1 was measured in the lung and E130013N09Rik was measured in the liver. Each dot represents an individual animal and the line is the mean expression for that treatment. Figure 6D. Heat map of the NMR spectral bins from the serum measurements. Red represents high metabolite concentration and blue is low concentration.
Figures 7A and 7B. Results from the statistical classification analysis for the gene expression and metabolite biomarkers. Figure 7A. Accuracy of the support vector machine statistical classification model with increasing number of genes or NMR spectral bins. Accuracy was estimated based on six-fold cross- validation and calculated by dividing the number of correct predictions by the total number of predictions. Figure 7B Listing of the top 5 gene expression biomarkers in the lung and liver. The listing was based on the Golub score (Golub et al. 1999) ranking.
Figure 8. Real-time RT-PCR confirmation of potential lung gene expression biomarkers that showed discriminating expression between carcinogenic chemicals (NAPD, BFUR) and noncarcinogenic chemicals (NEDD, PCNB) and controls (CCON, FCON). (A) Ces1 (Mm00491334_m1). (B) lkbkg (Mm00494927_m1). (C) Nqo1 (Mm01253562_m1). (D) 1110032A04Rik (Mm00504963_m1). The data were normalized to the expression of ribosomal protein L32 (RPL32, Mm02528467_g1) and the feed control using the ΔΔCt method. Statistical analysis was performed using two sample t-tests. P-values less than 0.05 were deemed significantly different. Each bar represents the mean and standard error of three different animals. *, statistically different between carcinogenic chemicals and the remaining treatments, f. statistically different between carcinogenic and non-carcinogenic chemicals, φ, statistically different between treatment group and corresponding control. Figure 9. Real-time RT-PCR confirmation of potential liver gene expression biomarkers that showed discriminating expression between carcinogenic chemicals (NAPD, BFUR) and noncarcinogenic chemicals (NEDD, PCNB) and controls (CCON, FCON). (A) E130013N09Rik (forward primer, 5'- TCCAGGCAAAAAGAAGAGTATCCAA-S' (SEQ ID NO: 17); reverse primer, 5'- CATTTGAACGACTCAGTTAGTCTAACCA-S1 (SEQ ID NO:18); and probe, 5'- CTGCCACCCATTCATG-3' (SEQ ID NO: 19)). (B) Ugdh (Mm00447645_m1). (C) 4922503N01 Rik (MmOO462815_m1). (D) Gsta1/Gsta2 (Mm00833353_mH). The data were normalized to the expression of ribosomal protein L32 (RPL32, Mm02528467_g1) and the feed control using the ΔΔCt method. Statistical analysis was performed using two sample t-tests. P-values less than 0.05 were deemed significantly different. Each bar represents the mean and standard error of three different animals. *, statistically different between carcinogenic chemicals and the remaining treatments, t. statistically different between carcinogenic and non-carcinogenic chemicals, φ, statistically different between treatment group and corresponding control.
Figure 10. Flow chart outlining the statistical classification and cross- validation process used for data analysis and estimating the predictive accuracy of the gene expression for identifying important biomarkers. Figure 11. Results for predictive accuracy of lung gene expression biomarkers identified for tumor formation.
Figure 12. Results for predictive accuracy of liver gene expression biomarkers identified for tumor formation.
Figure 13. Specific liver gene expression biomarkers as identified from analysis outlined in Figure 10.
Figure 14. Comparison of the predictive LOOCV accuracy of liver gene expression biomarkers identified using various classification algorithms. Detailed Description
The present invention will now be described with reference to the following embodiments. As is apparent by these descriptions, this invention can be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. For example, features illustrated with respect to one embodiment can be incorporated into other embodiments, and features illustrated with respect to a particular embodiment can be deleted from that embodiment. In addition, numerous variations and additions to the embodiments suggested herein will be apparent to those skilled in the art in light of the instant disclosure, which do not depart from the instant invention.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Except as otherwise indicated, standard methods can be used for manipulation of nucleic acid sequences, electrophoresis, blotting, protein visualization and the like according to the present invention. Such techniques are known to those skilled in the art. See, e.g., SAMBROOK et al., MOLECULAR CLONING: A LABORATORY MANUAL 2nd Ed. (Cold Spring Harbor, NY, 1989); F. M. AUSUBEL et al. CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (Green Publishing Associates, Inc. and John Wiley & Sons, Inc., New York). All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. However, the citation of a reference herein should not be construed as an acknowledgement that such reference is prior art to the present invention described herein.
As used herein, "a" or "an" or "the" can mean one or more than one. Also as used herein, "and/or" refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted in the alternative ("or"). Furthermore, the term "about," as used herein when referring to a measurable value such as an amount of a compound or agent of this invention, dose, time, temperature, and the like, is meant to encompass variations of 20%, 10%, 5%, 1%, 0.5%, or even 0.1% of the specified amount. "Nucleic acid" or "nucleic acid sequence" as used herein encompasses both RNA and DNA, including cDNA, genomic DNA, synthetic (e.g., chemically synthesized) DNA and chimeras of RNA and DNA. The nucleic acid may be double-stranded or single-stranded. Where single-stranded, the nucleic acid may be a sense strand or an antisense strand. The nucleic acid may be synthesized using oligonucleotide analogs or derivatives (e.g., inosine or phosphorothioate nucleotides). Such oligonucleotides can be used, for example, to prepare nucleic acids that have altered base-pairing abilities or increased resistance to nucleases.
"Isolated" nucleic acid (e.g., an "isolated DNA") as used herein refers to a nucleic acid separated or substantially free from at least some of the other components of the naturally occurring organism or virus, such as for example, the cell or viral structural components or other polypeptides or nucleic acids commonly found associated with the nucleic acid. Likewise, an "isolated" polypeptide means a polypeptide that is separated or substantially free from at least some of the other components of the naturally occurring organism or virus, for example, the cell or viral structural components or other polypeptides or nucleic acids commonly found associated with the polypeptide. As used herein, the "isolated" polypeptide is at least about 25%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or more pure (w/w).
"Polypeptide" as used herein is used interchangeably with "protein," and refers to a polymer of amino acids (dipeptide or greater) linked through peptide bonds. Thus, the term "polypeptide" includes proteins, oligopeptides, protein fragments, protein analogs and the like. The term "polypeptide" contemplates polypeptides as defined above that are encoded by nucleic acids, are recombinantly produced, are isolated from an appropriate source, or are synthesized. A "functional" polypeptide is one that retains at least one biological activity normally associated with that polypeptide. According to embodiments of the present invention, a "functional" polypeptide retains all of the activities possessed by the unmodified peptide. By "retains" biological activity, it is meant that the polypeptide retains at least about 50%, 60%, 75%, 85%, 90%, 95%, 97%, 98%, 99%, or more, of the biological activity of the native polypeptide (and can even have a higher level of activity than the native polypeptide). A "nonfunctional" polypeptide is one that exhibits essentially no detectable biological activity normally associated with the polypeptide (e.g., at most, only an insignificant amount, e.g., less than about 10% or even 5%).
"Fragment" as used herein is one that substantially retains at least one biological activity normally associated with that protein or polypeptide. In particular embodiments, the "fragment" substantially retains all of the activities possessed by the unmodified protein. By "substantially retains" biological activity, it is meant that the protein retains at least about 50%, 60%, 75%, 85%, 90%, 95%, 97%, 98%, 99%, or more, of the biological activity of the native protein (and can even have a higher level of activity than the native protein).
"Isoform" as used herein refers to a different form of a protein, regardless of whether it originates from a different gene or splice variant or by modification of a single gene product. Thus "isoform" as used herein refers to a form of a protein that migrates differently from another form of that protein on a two-dimensional gel.
"Altered level" or "altered levels" as used herein refer to an increased level (e.g., a one or two fold increase, or more) or a decreased level (e.g., a one or two- fold decrease, or more) in the expression activity of a nucleic acid sequence and/or quantity of protein resulting from the expression of the nucleic acid sequence in or via a sample, as compared to a level or levels in a corresponding control sample. The sample can be a biological sample taken from a subject. The corresponding sample can be a biological sample taken from a subject not afflicted with a tumor or a sample from a source not known to be derived from a source associated with a tumor.
"Biological sample" as used herein refers to any material taken from the body of a subject that may carry the target nucleic acid sequence or polypeptide described herein, including both tissue samples and biological fluids such as blood samples, saliva samples, mucus samples, urine samples, etc.
"Biomarker" as used herein refers to any nucleic acid or polypeptide that can be detected, directly or indirectly (e.g., via an analog, metabolite, fragment or breakdown product) in a sample, such as a biological sample from a subject, an increase or decrease of the amount of which, compared to amounts found in similar control samples, such as subjects without disease, is indicative of the presence or risk of tumor formation. The analog, metabolite, fragment or breakdown product of the biomarker may or may not possess all the functional activity of the biomarker. "Tumor" as used herein refers to an abnormal growth of cells or tissues.
Tumors can be malignant or benign. As used herein, malignant tumors include cancerous growth denoted as an uncontrolled growth of tissue that has the potential to spread to adjacent or distant sites of the body. Exemplary tumors include malignant disorders such as breast cancers, osteosarcomas, angiosarcomas, fibrosarcomas and other sarcomas, leukemias, lymphomas, sinus tumors, ovarian, uretal, bladder, prostate and other genitourinary cancers, colon, esophageal and stomach cancers and other gastrointestinal cancers, lung cancers, myelomas, pancreatic cancers, liver cancers, kidney cancers, endocrine cancers, skin cancers, melanomas, angiomas, and brain or central and peripheral nervous (CNS) system tumors, malignant or benign, including gliomas and neuroblastomas.
"Carcinogenic" as used herein refers to the ability of a compound to promote tumor growth and/or facilitate propagation of the tumor. As used herein, "carcinogenic" and "tumorigenic" can be used interchangeably. "Cytotoxic compound" as used herein refers to a compound that imparts cellular dysfunction, deterioration and/or cell death. As used herein, a cytotoxic compound can be a carcinogenic, and thus tumorigenic.
"Subchronic" as used herein refers to a limited exposure to a chemical to cause an effect compared to a chronic exposure where "chronic" as used herein refers to a more prolonged exposure to the chemical.
"Subjects" as used herein are generally human subjects and includes, but is not limited to, "patients." The subjects may be male or female and may be of any race or ethnicity, including, but not limited to, Caucasian, African-American, African, Asian, Hispanic, Indian, etc. The subjects may be of any age, including newborn, neonate, infant, child, adolescent, adult, and geriatric. Subjects may also include animal subjects, particularly mammalian subjects such as canines, felines, bovines, caprines, equines, ovines, porcines, rodents (e.g. rats and mice), lagomorphs, primates (including non-human primates), etc., screened for veterinary medicine or pharmaceutical drug development purposes. Subjects include, but are not limited, to those who may have, possess, have been exposed to, or have been previously diagnosed as afflicted with one or more risk factors for lung or liver cancer. Risk factors for lung cancer include, but are not limited to, age, gender, smoking habits and exposure to second-hand smoke, diet, work exposure and family history. Risk factors for liver cancer include, but are not limited to, age, gender, alcohol consumption, hepatitis, cirrhosis, exposure to irritants and family history.
While the following description focuses primarily on lung and liver tumor formation, it will be appreciated that the present invention may also be utilized in connection with other hyperproliferative disorders.
According to embodiments of the present invention, methods of predicting tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, include: (a) determining a regulation pattern of at least one biomarker from a sample including at least one biomarker isolated from a biological sample taken from a subject wherein the biomarker comprises a nucleic acid sequence or polypeptide and fragments, variants and isoforms thereof that create a favorable cellular environment for chemically-induced tumor formation; and (b) correlating an altered level of regulation of the at least one biomarker to a likelihood of tumor formation. Regulation can relate to nucleic acid expression and/or detection of protein levels. >•
While the method does not require collecting a sample, it is noted that collecting a sample can be carried out either directly or indirectly by any suitable technique. For example, a blood sample from a subject can be carried out by phlebotomy or any other suitable technique, with the blood sample processed further to provide a serum sample or other suitable blood fraction.
In some embodiments, the biomarker is a metabolic enzyme and/or growth factor receptor. Accordingly, the nucleic acid sequence described above can encode a polypeptide corresponding to a metabolic enzyme and/or growth factor receptor. The metabolic enzymes encoded by the nucleic acids can include those known to be involved in endogenous and xenobiotic metabolic processes and the growth factor receptors can include those known to be involved in tissue and/or organ development. In particular embodiments of the present invention, the nucleic acid sequences encode a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a polypeptide encoded by nucleic acid sequence of SEQ ID NO:6 (AU018778), a glutathione S-transferase and fragments, variants and isoforms thereof. The nucleic acid sequences encoding these polypeptides and fragments, variants and isoforms thereof further encompass those nucleic acids encoding polypeptides that have at least about 60%, 70%, 80%, 90%, 95%, 97%, 98% or higher amino acid sequence similarity with the polypeptides disclosed herein (or fragments thereof)- As is known in the art, a number of different programs can be used to identify whether a nucleic acid or polypeptide has sequence identity or similarity to a known sequence. Sequence identity and/or similarity can be determined using standard techniques known in the art, including, but not limited to, the local sequence identity algorithm of Smith & Waterman, Adv. Appl. Math. 2, 482 (1981), by the sequence identity alignment algorithm of Needleman & Wunsch, J. MoI. Biol. 48,443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85,2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Drive, Madison, Wl), the Best Fit sequence program described by Devereύx et al., Nucl. Acid Res. 12, 387-395 (1984), preferably using the default settings, or by inspection. According to some embodiments of the invention, the methods include determining the nucleic acid expression pattern of sixteen nucleic acid sequences, wherein the nucleic acid sequences include SEQ ID NO:1 (Ugt1a1 , Accession No. NM_201645.2), SEQ ID NO:2 (Ces1 , Accession No. NM_021456.3), SEQ ID NO:3 ( Fgfr2, Accession No. BB220625.2), SEQ ID NO:4 (Ephxi , Accession No. NM_010145.2), SEQ ID NO:5 (Ugt1a2, Accession No. NM_013701.3), SEQ ID NO:6 (AU018778, Accession No. BC013479.1), SEQ ID NO:7 (Gstml , Accession No. NM_010358.4), SEQ ID NO:8 (Ddit4l, Accession No. NM_030143.3), SEQ ID NO:9 (Ikbkg transcript variant 1 , Accession No. NM_010547.1), SEQ ID NO: 10 (Ikbkg transcript variant 2, Accession No. N M_178590.3), SEQ ID NO:11 (Ugt1a5, Accession No. NM_201643.2), SEQ ID NO: 12 (Ugt1a6a, Accession No. NM_145079.3), SEQ ID NO:13 (Ugt1a6b, Accession No. NM_201410.1), SEQ ID NO:14 (Ugt1a7c, Accession No. NM_201642.4), SEQ ID NO:15 (Ugt1a9, Accession No. NM_201644.2), SEQ ID NO:16 (Ugt1a10, Accession No. NM_201641.2) and variants and isoforms thereof. See Table 5 and examples below for further discussion of these nucleic acid sequences.
The methods according to the present invention further include measuring the levels of RNA, e.g. mRNA, and levels of proteins. Such measurements can be made according to methods well known in the art as discussed above and as provided in the examples below. See, e.g., SAMBROOK ef a/., MOLECULAR CLONING: A LABORATORY MANUAL 2nd Ed. (Cold Spring Harbor, NY, 1989); F. M. AUSUBEL ef a/. CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (Green Publishing Associates, Inc. and John Wiley & Sons, Inc., New York). Generally, determining the presence of an altered level of a biomarker in the sample can also be carried out either directly or indirectly in accordance with known techniques, including, but not limited to, mass spectrometry, chromatography, electrophoresis, sedimentation, isoelectric focusing, and antibody assay. See, e.g., U.S. Patent No. 6,589,748; U.S. Patent No. 6,027,896. For example, biomarkers may be identified by two-dimensional electrophoresis (2- D electrophoresis). 2D-electrophoresis is a technique comprising denaturing electrophoresis, followed by isoelectric focusing; this generates a two-dimensional gel (2D gel) containing a plurality of separated proteins. By "increased level" it is meant (a) any level of a biomarker when that biomarker is not present in a normal subject without tumor formation, as well as (b) an elevated level (e.g., a two or three-fold increase in detected quantity) of a biomarker or a particular fragment, variant or isoform of a biomarker when that biomarker or a particular fragment, variant or isoform is present in a normal subject without tumor formation. By "depressed level" it is meant (a) an absence of a particular biomarker or fragment, variant or isoform of a particular biomarker when that biomarker is present in a normal subject without tumor formation, as well as (b) a reduced level (e.g., a two or three-fold reduction in detected quantity) of a biomarker or fragment, variant or isoform of a biomarker when that biomarker or fragment, variant or isoform is present in a normal subject without tumor formation. In general, the steps of (a) assaying a sample for an elevated level of a biomarker and/or depressed level of a biomarker, and (b) correlating an elevated level of a biomarker and/or a depressed level of a biomarker in said sample associated with tumor formation can be carried out in accordance with known techniques or variations thereof that will be apparent to persons skilled in the art. Further, antibody assays used in some approaches (immunoassays) may, in general, be homogeneous assays or heterogeneous assays. In a homogeneous assay the immunological reaction usually involves the specific antibody, a labeled analyte, and the sample of interest. The signal arising from the label is modified, directly or indirectly, upon the binding of the antibody to the labeled analyte. Both the immunological reaction and detection of the extent thereof are carried out in a homogeneous solution. Immunochemical labels which may be employed include free radicals, radioisotopes, fluorescent dyes, enzymes, bacteriophages, coenzymes, and so forth. In a heterogeneous assay approach, the reagents are usually the specimen, the antibody of the invention and a system or means for producing a detectable signal. Similar specimens as described above may be used. The antibody is generally immobilized on a support, such as a bead, plate or slide, and contacted with the specimen suspected of containing the antigen in a liquid phase. The support is then separated from the liquid phase and either the support phase or the liquid phase is examined for a detectable signal employing means for producing such signal. The signal is related to the presence of the analyte in the specimen. Means for producing a detectable signal include the use of radioactive labels, fluorescent labels, enzyme labels, and so forth. For example, if the antigen to be detected contains a second binding site, an antibody which binds to that site can be conjugated to a detectable group and added to the liquid phase reaction solution before the separation step. The presence of the detectable group on the solid support indicates the presence of the antigen in the test sample. Examples of suitable immunoassays are the radioimmunoassay, immunofluorescence methods, enzyme-linked immunoassays, and the like.
Those skilled in the art will be familiar with numerous specific immunoassay formats and variations thereof which may be useful for carrying out the method disclosed herein. See generally, for example, E. Maggio, Enzyme-lmmunoassay, (1980) (CRC Press, Inc., Boca Raton, FIa.). According to embodiments of the present invention, the methods described herein are applicable to predicting tumor formation at any organ site. The organ site can include, but is not limited to, liver, lung, kidney, mammary, and hematopoietic sites. Tumors include, but are not limited to, the tumors described above, and in some embodiments, breast cancer, osteosarcoma, angiosarcoma, fibrosarcoma, leukemia, sinus tumor, ovarian cancer, uretal cancer, bladder cancer, prostate cancer, genitourinary cancer, gastrointestinal cancer, lung cancer, lymphoma, myeloma, pancreatic cancer, liver cancer, kidney cancer, endocrine cancer, skin cancer, melanoma, angioma and brain or central nervous system (CNS) cancer. In some embodiments, the tumor is associated with the lung or liver. As the tumor may be organ and/or site specific, nucleic expression as described herein can be tissue specific in particular embodiments of the present invention.
According to further embodiments of the present invention, the methods of predicting tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay provide at least about 90% accuracy of predicting tumor formation. In some embodiments, the methods provide at least about 93% accuracy, 90% sensitivity and 90% specificity of predicting tumor formation.
In particular embodiments, the present invention provides methods of predicting lung or liver tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, comprising: (a) determining the nucleic acid expression pattern of eight nucleic acid sequences comprising SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO: 10 (Ikbkg transcript variant 2), SEQ ID NO: 11 (Ugt1a5), SEQ ID NO: 12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a9), SEQ ID NO: 16 (Ugt1a10) and variants and isoforms thereof; and (b) correlating an altered level of nucleic acid expression of the nucleic acid sequences to a likelihood of tumor formation to predict lung or liver tumor formation with at least about 94% accuracy.
Embodiments of the present invention further provide methods of assessing a substance for carcinogenic potential, including: (a) determining the regulatory pattern of a biomarker from a sample including at least one biomarker isolated from a biological sample taken from a subject exposed to a substance to be tested for carcinogenicity; and (b) correlating an altered level of regulation of the at least one biomarker to an increased likelihood of tumor formation, wherein an increased likelihood of tumor formation indicates that the substance has carcinogenic potential. In particular embodiments, the biomarker is a nucleic acid sequence that encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778) and/or a glutathione S-transferase and variants and isoforms thereof. In other embodiments, the method includes determining the nucleic acid expression pattern of eight nucleic acid sequences, wherein the nucleic acid sequences include SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO:11 (Ugt1a5), SEQ ID NO:12 (Ugtiaβa), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO: 15 (Ugt1a9), SEQ ID NO: 16 (Ugt1a10) and variants and isoforms thereof.
According to embodiments of the present invention, the substance can be a chemical, biological and/or physical agent. The substance can include, but is not limited to, a pharmaceutical product, food additive, pesticide or cleaning product. The substances further include commercial, industrial, residential and environmental chemicals.
In particular embodiments of the present invention, exposure to the substance is a subchronic exposure. For example, in the rodent cancer bioassay, exposure to the substance to be tested is for a period of about two years. In embodiments of the present invention, exposure to the substance is less than about two years, less than about one year, less than about six months, less than about 4 months or about or less than about 3 months. In some embodiments the exposure is days. In particular embodiments, the exposure is about or less than about 90 days.
Embodiments of the present invention provide methods of using a nucleic acid biomarker to predict tumor formation, including: determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample including a nucleic acid isolated from a biological sample taken from a subject wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778), a glutathione S-transferase and variants and isoforms thereof. Further embodiments of the present invention include methods of identifying a biomarker for predicting tumor formation resulting from exposure to a substance, including: (a) comparing regulation of a suspected biomarker from a biological sample of (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound, (b) identifying at least one biomarker that is differentially regulated after exposure to the cytotoxic compound compared to the regulation after exposure to the non- cytotoxic compound; and (c) statistically correlating the differential regulation to a likelihood of tumor formation thereby indicating that the at least one suspected biomarker is a biomarker for predicting tumor formation resulting from exposure to a substance. "Differential regulation" as used herein can refer to altered nucleic acid expression and/or altered protein levels. Observing altered levels of nucleic acid expression can be used to identify a nucleic acid biomarker, and observing altered protein levels can be used to identify a protein biomarker. Thus, upon detection of the differentially regulated nucleic acid expression, detection of the quantity of protein resulting therefrom can serve as a biomarker for predicting tumor formation.
Concerning the cytotoxic compounds, as understood by those skilled in the art, .. generally, any compound can be "cytotoxic" under certain conditions. Accordingly, in embodiments of the present invention, a "non-cytotoxic" compound is one that serves as a control compound for purposes of comparing effects of exposure to a cytotoxic compound as described herein.
Embodiments of the present invention further provide methods of determining a nucleic acid expression profile to predict tumor formation, including: (a) performing a microarray analysis on at least one nucleic acid sequence isolated from a biological sample taken from (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound; and (b) statistically analyzing the ability of the expression of at least one nucleic acid sequence to be differentially regulated during cytotoxic and non- cytotoxic treatments, wherein the differential regulation of the at least one nucleic acid sequence establishes a nucleic acid expression profile to predict tumor formation. Detecting differential regulation can include measurement of changes in nucleic acid expression and/or measurement of changes in protein levels of proteins expressed by the nucleic acid sequences. In general, the statistical methods used to derive embodiments of the present invention include the Golub algorithm (Golub et al. 1999) for feature selection and a support vector machine model for classification analysis. The predictive accuracy of the statistical classification analysis was assessed using N- fold cross-validation. Different numbers of genes were evaluated in the feature selection process to assess the change in predictive accuracy with gene number. The classification analysis was performed using the PCP software program (Buturovic 2006).
Embodiments of the present invention also provide kits including probes that hybridize with at least one biomarker to predict tumor formation. In particular embodiments, the kits include a probe that hybridizes with a nucleic acid sequence including SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO: 11 (Ugt1a5), SEQ ID NO:12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a9), SEQ ID NO:16 (Ugt1a10) and variants and isoforms thereof, under conditions whereby nucleic acid hybridization can occur. The kits can include at least one biochemical material and/or reagent, such as buffers and/or binding partners, that are capable of specifically binding with one or more of the biomarkers described herein. Such materials and/or reagents can provide a means for determining binding between the biochemical material and one or more biomarkers, whereby at least one analysis to determine a presence of one or more biomarkers is carried out on a sample. Optionally such analysis or analyses may be carried out with the additional use of detection devices for immunoassay, chromatography, spectrometry, electrophoresis, sedimentation, isoelectric focusing, or any combination thereof. Analysis may be carried out on a single sample or multiple samples.
The kits can include instructions for performing the method or assay. In some embodiments, the kits can further include instructions for predicting tumor formation resulting from exposure to a substance. The kits can include instructions for evaluating a chemical for carcinogenic potential. Further, the kits can include instructions for assaying a biological sample for the presence of tumor formation biomarkers. In addition, the kits may optionally comprise depictions or photographs that represent the appearance of positive and negative results. In some embodiments, the components of the kits may be packaged together in a common container.
The present invention is explained in greater detail in the following non- limiting examples.
Examples Example 1
Identification of Gene Expression Biomarkers Predictive of Results of Two-Year Rodent Cancer Bioassays
Existing data generated by the National Toxicology Program (NTP) was used to identify gene expression biomarkers that can predict results from a rodent cancer bioassay. In particular, this study focused on lung tumors and expanded the number of carcinogenic and noncarcinogenic chemicals to assess the ability of gene expression biomarkers to predict increased tumor incidence across a diverse set of chemicals. A flow chart for the experimental design is presented in Figure 1.
Materials and Methods Chemicals. 1 ,5-Naphthalenediamine (NAPD; CAS No. 2243-62-1 ; Purity:
97%), 2,3-benzofuran (BFUR; CAS No. 271-89-6; Purity: 99%), N-(1- naphthyl)ethylenediamine dihydrochloride (NEDD; CAS No. 1465-25-4; Purity: 98%), pentachloronitrobenzene (PCNB; CAS No. 82-68-8; Purity: 99%), 2,2- bis(bromomethyl)-1 ,3-propanediol (BBMP; CAS No. 3296-90-0; Purity: 98%), 1 ,2- dibromoethane (DBET; CAS No. 106-93-4; Purity: 99%), coumarin (COUM; CAS No. 91-64-5; Purity: 99%), benzene (BENZ; CAS No. 71-43-2; Purity: 99%), and 2-chloromethylpyridine hydrochloride (CMPH; CAS No. 6959-47-3; Purity: 98%) were purchased from Sigma-Aldrich (St. Louis, MO). N-methylolacrylamide (MACR; CAS No. 924-42-5; Purity: 98%) was purchased from Pfaltz & Bauer (Waterbury, CT). 4-Nitroanthranilic acid (NAAC; CAS No. 619-17-0; Purity: 98.5%), diazinon (DIAZ; CAS No. 333-41-5; Purity: 98%), and malathion (MALA; CAS No. 121-75-5; Purity: 95%) were purchased from Advanced Technology and Industry (Hong Kong, China). Animals and Treatment One-hundred and fifty female B6C3F1 mice were obtained from Charles River Laboratories (Raleigh, NC). Female B6C3F1 mice were chosen since they represent the most sensitive model for chemically- induced lung tumor formation in the NTP rodent bioassay. Upon receipt, the mice were randomized by weight and divided into treatment groups (Table 1). The 13 chemicals used in this study have been previously tested by the NTP. Seven of the chemicals were positive for an increased incidence of primary alveolar/bronchiolar adenomas or carcinomas and six were negative. Study results from the NTP for each of the 13 chemicals is summarized in Table 2. Animal treatment was initiated at 5 weeks of age. Mice were housed 5 per cage in polycarbonate cages in a temperature (17.8 to 26.1°C) and humidity (30 to 70%) controlled environment with a standard 12 h light/dark cycle. All animals were given access to food (Harlan Teklad; NIH-07 ground meal; Madison, Wl) and water ad libitum. Animal exposures for each chemical were performed via the route and dose listed in Table 1. Gavage exposures were administered 5 days/wk and feed exposures were provided 7 days/wk. The oral route of exposure was chosen for this study since the majority of chemicals producing lung tumors in the NTP rodent bioassay were delivered through the oral route.
Animal use in this study was approved by the Institutional Animal Use and Care Committee of The Hamner Institutes (formerly CUT Centers for Health Research) and was conducted in accordance with the National Institutes of Health guidelines for the care and use of laboratory animals. Animals were housed in fully-accredited American Association for Accreditation of Laboratory Animal Care (AAALAC) facilities. Following 13 weeks of exposure, the mice were euthanized with a lethal i.p. dose of sodium pentobarbital (Abbott Laboratories, Chicago, IL). The four right lung lobes were isolated by suturing, removed, and minced together in RNA/afer™ (Ambion, Austin, TX). The left lung lobe was inflated with 10% neutral buffered formalin and stored in 10% formalin for further processing. Following a standard fixation period, the lung tissues were embedded into paraffin blocks, sectioned at 5 μm, and stained with hematoxylin and eosin. Histological changes were assessed by an accredited pathologist.
Gene Expression Microarray Analysis. Microarray analysis was performed on 3 to 4 animals per treatment group except for the corn oil and feed control groups (CCON and FCON) where additional animals were analyzed due to staggered exposures with parallel control groups. A total of 70 animals were analyzed. Total RNA was isolated from the lung tissue using Trizol reagent (Invitrogen, Carlsbad, CA). The isolated RNA was further purified using RNeasy columns (Qiagen, Valencia, CA) and the integrity of the RNA was verified spectrophotometrically and with the Agilent 2100 Bioanalyzer (Palo Alto, CA). Double-stranded cDNA was synthesized from 5 μg of total RNA using the One- Cycle cDNA synthesis kit (Affymetrix, Santa Clara, CA). Biotin-labeled cRNA was transcribed from the cDNA using the GeneChip IVT Labeling Kit (Affymetrix). Fifteen μg of labeled cRNA was fragmented and hybridized to Affymetrix Mouse Genome 430 2.0 arrays for 16 h at 45°C. The hybridized arrays were washed using the GeneChip Fluidics Station 450 and scanned using a GeneChip 3000 scanner. Microarray data were processed using RMA with a Iog2 transformation (Irizarry et al. 2003). The gene expression results have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (Accession No.: GSE6116).
Analysis of Chemical Diversity. Molecular descriptors representing the two-dimensional structure of each of the 13 chemical treatments were downloaded from PubChem in the simplified molecular input line entry specification (SMILES) format. SMILES codes for all single chemicals tested in the NTP rodent cancer bioassay were downloaded from DSSTox (Richard et al. 2006). The SMILES code for each chemical was converted into a chemical fingerprint using the GenerateMD software application (Version 3.1.7.1 , JChem, ChemAxon, Budapest, Hungary). The chemical fingerprints were then compared for structural similarity using the Tanimoto coefficient and the Compr software application (Version 3.1.7.1 , JChem, ChemAxon).
Basic Statistical and Annotation Analysis of Microarray Data. Basic statistical differences were analyzed using both a one-way analysis of variance for differences across the 13 chemical treatments and a linear model (Smyth 2005) with a contrast between the lung carcinogens and the noncarcinogenic treatments. In both analyses, probability values were adjusted for multiple comparisons using a false discovery rate (Reiner et al. 2003). Analysis of enrichment within gene ontology (GO) categories was performed using NIH David (Dennis et al. 2003). Briefly, Affymetrix probe set identifiers for the genes of interest were uploaded to the DAVID web site and analyzed based on the Affymetrix 430 2 reference list. A hypergeometric test was performed to identify GO categories with significantly enriched gene numbers. The resulting list of GO categories was refined by selecting categories containing ≥ 2 genes.
Statistical Classification Analysis. Classification analysis was performed using a combination of the Golub algorithm (Golub et al. 1999) for feature selection and a support vector machine model for classification (radial basis function kernel, C = 17150, Y = 0.0022). The predicted endpoint was increased lung tumor incidence in female B6C3F1 mice according to the NTP rodent cancer bioassay (Table 2). To assess the predictive accuracy of the model on the current dataset, ten-fold cross-validation was performed. The cross-validation process consisted of first randomly dividing all 70 animals into ten equally sized groups (i.e., seven animals per group). Nine of the groups were then lumped together to use as a training set (63 animals) and the remaining group was used as the test set (7 animals). The data for the animals in the test set was set aside as if never observed them. Feature selection was then performed on the training set using the Golub algorithm (Golub et al. 1999) and the genes with the largest Golub statistic were used to build a support vector machine classification model. The model was then used to predict the classes for the seven animals in the test set that were held out at the beginning of the process. The cross-validation process was repeated at least 100 times to obtain a good estimate of the predictive accuracy. Accuracy was calculated by dividing the number of correct predictions in the test set by the total number of predictions. Different numbers of genes were evaluated in the feature selection process to assess the change in predictive accuracy with gene number. The classification analysis was performed using the PCP software program (Buturovic 2006). Results
Structural and Mechanistic Diversity among Chemical Treatments.
The 13 chemical treatments in the study were intentionally chosen to be diverse in terms of chemical structure, genotoxicity, and potential modes-of-action. The structural diversity among the chemicals was analyzed using a Tanimoto similarity coefficient with a coefficient of 1.0 being identical molecules and 0.0 having no structural similarity. The average similarity among all 13 chemicals in the study was 0.141 with a maximum similarity of 0.508 between NEDD and NAPD (Table 3). Among the lung carcinogens, the average similarity dropped to 0.123 with a maximum similarity of 0.327 between DBET and BBMP. By comparison, the average similarity for all single chemicals tested by the NTP in a rodent cancer bioassay was 0.155.
To assess potential differences in mode-of-action, gene expression changes across the 13 chemical treatments were used as a surrogate of mechanistic diversity. A total of 28,780 probe sets corresponding to 25,375 unique transcripts were significantly altered among the chemical treatments based on a one-way analysis of variance. Given that there are an estimated 39,015 unique transcripts on the microarray, the number of transcripts altered by the 13 chemicals corresponds to approximately 65% of the transcriptome. Histological Changes?' Gross histological examination of the lung tissue identified treatment-related lesions in only NAPD treated animals. Morphological changes were found in all five animals examined and were limited to the bronchiolar epithelial cells which exhibited karyomegaly and karyorrhexis. There was occasional peribronchiolar infiltration by neutrophils and mononuclear cells. Bronchiolar epithelial cell morphology was suggestive of regenerative hyperplasia. In the 2-year NTP study, the primary nonneoplastic lesion was adenomatous hyperplasia occurring in 30% of the animals (NTP 1978). Histopathological changes in the 13-week subchronic study were not provided for NAPD in the original NTP report. Given the absence of lung lesions among the remaining tumorigenic treatment groups, histological changes alone following a 90-day exposure were not predictive of increased lung tumor incidence in a two-year bioassay. This result is consistent with a previous study that reported the poor predictive properties of histological lesions (Allen et al. 2004). Gross Gene Expression Differences between Lung Carcinogens and Noncarcinogenic Treatments. To obtain an overall sense of key differences between the lung carcinogens and noncarcinogenic treatments, a two-sample statistical comparison was performed between animals treated with chemicals showing increased lung tumor incidence in the two-year bioassay and animals treated with the negative chemicals plus the vehicle controls. A total of 82 probe sets corresponding to 75 unique transcripts were significantly altered. Sixty-five transcripts were significantly upregulated in animals treated with the lung carcinogens and 10 transcripts were significantly downregulated. The gene expression differences between the lung carcinogens and the noncarcinogenic chemicals are depicted in Figure 2 and a complete list is provided as supplemental material (Table 8). A subset of the significant gene expression changes were also verified using quantitative RT-PCR (Figure 4). Notably, there were a number of highly discriminating gene expression changes that were common among the lung carcinogens despite the diversity in chemical structures, genotoxicity categories, and potential mechanisms.
A GO analysis of the significant gene expression changes showed enrichment in multiple categories with the majority related to endogenous and xenobiotic metabolic processes (Table 4). Changes in glutathione-related processes were consistent with a variety of known biomarkers in both rodent and human tumorigenesis (Balendiran et al. 2004; Hayes and Pulford 1995; Kwak et al. 2004) and a previous study has also identified gene expression changes related to fatty acid metabolism in human colorectal cancers (Yeh et al. 2006). In addition, changes in aldehyde dehydrogenase activity have been associated with experimental and human tumors in a variety of tissues (Lindahl 1992).
Statistical Classification Analysis to Predict Increased Lung Tumor Incidence. To evaluate the ability of the gene expression changes to predict increased lung tumor incidence in a rodent cancer bioassay, statistical classification analysis was performed using a combination of the Golub feature selection algorithm (Golub et al. 1999) and a support vector machine model as the classifier. Ten-fold cross-validation was used to estimate the predictive accuracy. Using this approach, tissue gene expression profiles were capable of predicting a chemically-induced increase in lung tumor incidence with 93.9% accuracy using only eight probe sets that correspond to 6 different genes (Figure 3). The sensitivity and specificity of the model with the eight biomarkers was 95.2 and 91.8%, respectively. The predictive accuracy of the model declined as more genes were added. The top gene expression biomarkers were changes in the UDP-glucuronosyltransferase 1a (Ugtia) family, carboxylesterase 1 (Ces1), fibroblast growth factor receptor 2 (Fgfr2), epoxide hydrolase 1 (Ephxi), glutathione S-transferase μ 1 (Gstml), and an unannotated gene (Table 5). A complete ranking is provided as supplemental material (Table 9). For changes in UDP-glucuronosyltransferase 1a expression, the three corresponding probe sets were not specific for a particular isoform. The Ugtia isoforms are produced through the alternative splicing of variable exons connected to four constant exons at the 3'-end of the gene (Zhang er a/. 2004).
Discussion
The identification of biomarkers that predict an increase in tumor incidence is fundamentally different than biomarkers that predict tumor formation in an individual animal. The biomarkers that were identified in this study were likely to be genes that created a favorable cellular environment for chemically-induced lung tumor formation and not those that determined whether a specific animal gets tumors. Among the genes in the predictive signature, most were enzymes involved in endogenous and xenobiotic metabolic processes and one was a growth factor receptor involved in lung development. The functional breakdown of these predictive biomarkers was consistent with the established role of metabolism and growth factor signaling in tumorigenesis.
Among the most predictive metabolic enzymes was Ugtia. Ugtia is one of a family of enzymes that catalyze the glucuronidation of endogenous and xenobiotic molecules (Tukey and Strassburg 2000). The mouse Ugt1 locus produces nine different genes through the alternative splicing of 14 variable exons to four constant exons (Zhang et al. 2004). Genome-wide scans have identified the Ugtia locus as playing an important role in chemical carcinogenesis (Tukey and Strassburg 2000) and various isoforms have been shown to be differentially expressed in human liver cancer (Strassburg et al. 1997).
Another predictive metabolic enzyme was Ces1. Ces1 is part of a large multigene family of enzymes that hydrolyze ester and amide bonds and play a role in cellular cholesterol esterification (Ghosh 2000; Uphoff and Drexler 2000). Previous studies have suggested that Ces1 may play a role in detoxifying ester or amide containing xenobiotics in the lung (Munger et al. 1991 ; Uphoff and Drexler 2000). Notably, human CES1 was part of an 11 gene transcriptional signature that was used to predict therapy outcome and malignancy for multiple types of human cancer including lung cancer (Glinsky et al. 2005). In contrast to our studies, the downregulation of human CES1 was considered prognostic (Glinsky et al. 2005). However, the transcriptional signature in their study was applied to relatively late stage tumors and not as early classifier of carcinogenic potential.
The next metabolic enzyme in the predictive set was Ephxi . Ephxi has been shown to play a role in the activation and detoxification of many polyaromatic hydrocarbons (Arand et al. 2005). In human cancer, one study has noted an increased expression of human EPHX1 in hepatocellular carcinomas and variable expression in lung tumors (Coller et al. 2001). A separate study has identified increased expression of EPHX1 in human glioblastomas (Kessler et al. 2000). The increased expression in human liver cancer is supported by rodent studies where expression of Ephxi was increased in preneoplastic nodules (Griffin and Gengozian 1984; Novikoff et al. 1979).
The fourth most predictive metabolic enzyme was the relative uncharacterized AU018778 gene. The amino acid sequence of the AU018778 gene showed significant similarity to carboxylesterases with approximately 65% identity with mouse Ces1. On the genomic level, the gene is found in a cluster of esterases downstream of Ces1 and upstream of Es22 and Ces3. In normal tissue, AU018778 is predominantly expressed in kidney, liver, intestine, and adipose tissue (Su et al. 2004). No reports were found that showed an altered expression in cancer.
The last metabolic enzyme in the predictive set was Gstmi . Gstmi is part of a family of glutathione transferases that are involved in the metabolism of endogenous and xenobiotic molecules and can modulate cell signaling through a variety of mechanisms (Hayes et al. 2005). Although the majority of work on Gstmi in cancer has focused on associating human polymorphic differences with susceptibility, increased expression of GSTM 1 has been identified as a potential biomarker in human head and neck tumors (Bongers et al. 1995). In the lung, a previous study has reported that human GSTM1 was infrequently expressed in normal tissue and its expression was not increased in lung tumors (Spivack et al. 2003). In rodent studies, increased expression of mu class glutathione transferases have been observed in preneoplastic nodules in the rat liver (Hayes and Pulford 1995), but not in the mouse liver (Hatayama et al. 1993).
The only non-metabolic gene in the predictive set was Fgfr2. Fgfr2 is part of a family of receptor tyrosine kinases that bind fibroblast growth factors and initiate cellular signals that affect proliferation and differentiation (Eswarakumar et al. 2005). Alternative splicing of Fgfr2 results in two different isoforms, Fgfr2b and Fgfr2c, that have different ligand binding affinities (Eswarakumar et al. 2005). The targeted disruption of the Fgfr2b isoform in mice results in abnormal development of the lung, pituitary, thyroid, teeth, and limbs (De Moerlooze et al. 2000), while disruption of the Fgfr2c isoform results in skeletal abnormalities (Eswarakumar et al. 2002). Additional research has shown that Fgfr2b plays a significant role in lung development (De Langhe et al. 2006; del Moral et al. 2006). One study has reported that binding of Fgf9 to Fgfr2b cooperates with Shh signaling to regulate mesenchymal proliferation in lung development (White et al. 2006). Notably, expression of Shh was also found to be one of the top 20 predictive biomarkers in our study (Table 8). In cancer, expression of Fgfr2 has shown different behaviors depending on tissue and cell type. In human lung and colorectal cancer, increased expression of Fgfr2b was observed in cancer tissue (Watanabe et al. 2000; Yamayoshi et al. 2004) while in human gastric and bladder cancer, decreased expression of Fgfr2b was observed in cancer cells and was associated with poor patient prognosis (Diez de Medina et al. 1997; Matsunobu et al. 2006). In this study, decreased expression was predictive of lung tumor formation.
In summary, these results demonstrate that an increase in lung tumor incidence can be predicted based on gene expression changes following only a subchronic exposure. Although the present study was limited to 13 chemicals delivered through an oral route and the female mouse lung, the results suggest that this approach has the potential to be more broadly applied to other organ systems and animal models. Based on NTP records, five organ sites (liver, lung, kidney, mammary, and hematopoietic) account for approximately 50% of the positive chemical responses and 24 organ sites have at least 5 positive chemicals in at least one species and sex. Example 2
Comparison of Transcriptomic and Metabomonic Technologies for
Identifying Biomarkers Predictive of Results of
Two-Year Rodent Cancer Bioassays
Transcriptomic and metabonomic technologies for discovering biomarkers that can efficiently and economically identify chemical carcinogens without performing a standard two-year rodent bioassay were compared. In particular, the objectives of this study were to (1) compare transcriptomic and metabonomic technologies for their ability to identify predictive biomarkers related to these chemicals; and (2) demonstrate that biomarkers collected following a subchronic exposure to a chemical have the potential to predict liver and lung tumor formation observed in a two-year rodent bioassay.
Materials and Methods Animals and Treatment. Thirty female B6C3F1 mice were obtained from
Charles River Laboratories (Raleigh, NC). Upon receipt, the mice were randomized by weight and divided into 6 treatment groups (Table 6). Animal treatment was initiated at 5 weeks of age. Mice were housed 5 per cage in polycarbonate cages in a temperature and humidity controlled environment with standard 12 h light/dark cycle. All animals were given access to food (NIH-07 ground meal; Harlan Teklad; Madison, Wl) and water ad libitum. Animal use in this study was approved by International Animal Use and Care Committee of CIIT Centers for Health Research and was conducted in accordance with the National Institutes of Health guidelines for the care and use of laboratory animals. Animals were housed in fully-accredited American Association for Accreditation of Laboratory Animal Care (AAALAC) facilities. Pentachloronitrobenzene (PCNB) (8,187 ppm), N-(1-naphthyl)ethylenediamine dihydrochloride (NEDD) (2,000 ppm), and 1 ,5-naphthalenediamine (NAPD) (2,000 ppm) were administered 7 days per week via feeding. Benzofuran (BFUR) (240 mg/kg) was administered 5 days per week via gavage in a corn oil vehicle. A feeding control and corn oil vehicle control were also included. All chemicals were purchased at the highest purity available (Sigma-Aldrich, St. Louis, MO).
Following 13 weeks of exposure, the mice were anesthetized with a lethal Lp. dose of sodium pentobarbital (Abbott Laboratories, Chicago, IL). Blood was drawn by cardiac puncture, placed in a serum separator Microtainer® tube (Benton Dickinson, Franklin Lakes, NJ), and the serum isolated by centrifugation. The four right lung lobes were isolated by suturing, removed, and minced together in RNA/ater™ (Ambion, Austin, TX). The left lung lobe was inflated with 10% neutral buffered formalin and stored in 10% formalin. The right, caudate and median liver lobes were minced in RNA/ater™. The left liver lobe was removed and placed in 10% formalin. For histology, the formalin-fixed lung and liver tissues were embedded in paraffin blocks, sectioned at 5 μm, and stained with hematoxylin and eosin. Gene Expression Microarray Analysis. Microarray analysis was performed on the lungs and livers from 3 animals per treatment group. A total of 18 animals were analyzed. Total RNA was isolated from the lung and liver tissue using Trizol reagent (Invitrogen, Carlsbad, CA) and further purified using RNeasy columns (Qiagen, Valencia, CA). Integrity of the RNA was verified with the Agilent 2100 Bioanalyzer (Palo Alto, CA). Double-stranded cDNA was synthesized using the One-Cycle cDNA synthesis kit (Affymetrix, Santa Clara, CA) and biotin-labeled cRNA was transcribed using the GeneChip IVT Labeling Kit (Affymetrix). Labeled cRNA was fragmented and hybridized to Affymetrix Mouse Genome 430 2.0 arrays. Microarray data were processed using RMA with . a log2 transformation (Irizarry et al. 2003). The gene expression results have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (Accession No.: GSE5127 and GSE5128).
Serum NMR Analysis. NMR analysis was performed on the serum from 3 animals per treatment group. A total of 18 animals were analyzed. NMR samples were prepared by diluting serum samples to a final volume of 600 μl with a solution of D2O, containing 2,2-dimethyl-2-silapentane-5-sulfonate sodium salt (5 mM final) and sodium azide (0.02% w/v final). The 1H spectra were obtained at 399.80Mz on a Varian Inova 400MHz NMR spectrometer using a Varian 5 mm pulsed field gradient, inverse detection probe. The spectra were acquired with 256 scans, using a 2 second solvent presaturation period and a 200 ms CPMG filter to reduce the signals from the protein and lipid components. The total recycle time for each scan was 4.8 seconds. Spectral interpretation was aided by two-dimensional 1H-13C gHSQC correlation experiments on selected samples. Data were processed using ACD software (Advanced Chemistry Development, Toronto, Ontario). A 0.1 Hz exponential line broadening was applied to the data. The spectra were phased, baseline corrected, integrated using the ACD intelligent binning protocol, and normalized based on total bin area. The region around the residual water signal from 4.6 to 6 ppm was excluded from the analysis. To avoid inclusion of toxicant or exogenous metabolite peaks, the entire region above 7.0 ppm was excluded, as well as peaks associated with pentobarbital, propylene glycol and lactate.
Basic Statistical and Annotation Analysis of Tissue Gene Expression and Serum NMR Data. To obtain an overall sense of key differences in the data, gene expression measurements were analyzed using a linear model (Smyth 2005) with a contrast between the carcinogenic (NAPD and BFUR) and noncarcinogenic treatments (NEDD, PCNB, FCON, and CCON). Genes identified as statistically significant were subject to an additional filter by selecting only those that exhibited a > 1.5-fold change. Serum NMR measurements were analyzed using two-sample t-tests. Probability values for both gene expression and NMR measurements were adjusted for multiple comparisons using a false discovery rate of 5% (Reiner et al. 2003). For the significant gene lists, a gene ontology (GO) analysis was conducted using GO Tree Machine. (Zhang et al. 2004). Statistical Classification Analysis of Tissue Gene Expression and
Serum NMR Data. Classification analysis was performed using a combination of the Golub algorithm (Golub et al. 1999) for feature selection and a support vector machine model for classification (radial basis function kernel, C = 31.6, y = 0.0001 for metabolite prediction; radial basis function kernel, C= 1000, y = 0.001 for gene expression prediction). To assess the predictive accuracy of the model on the current dataset, six-fold cross-validation was performed. The cross-validation process is outlined in Figure 5 and consisted of first randomly dividing all 18 animals into six equally sized groups (i.e., three animals per group). Five of the groups were then lumped together to use as a training set (15 animals) and the remaining group was used as the test set (3 animals). The data for the animals in the test set was set aside as if we had never observed them. Feature selection was then performed on the training set using the Golub algorithm (Golub et al. 1999) and the genes or NMR spectral bins with the largest Golub statistic were used to build a support vector machine classification model. The model was then used to predict the classes for the three animals in the test set that were held out at the beginning of the process. The cross-validation process was repeated 100 times to obtain a good estimate of the predictive accuracy. Accuracy was calculated by dividing the number of correct predictions in the test set by the total number of predictions. Different numbers of genes were evaluated in the feature selection process to assess the change in predictive accuracy with gene number. The classification analysis was performed using the PCP software program (Buturovic 2006).
Results and Discussion
Gross histological examination of the target tissues identified lung lesions in the NAPD treatment group, with morphological changes in all five animals. Lesions were limited to the bronchiolar epithelial cells which exhibited karyomegaly and karyorrhexis. There was occasional peribronchiolar infiltration by neutrophils and mononuclear cells. Bronchiolar epithelial cell morphology was suggestive of regenerative hyperplasia. Liver lesions were only observed in BFUR treated animals, with relatively minor single cell necrosis. Given the absence of lung lesions in the BFUR treatment group and the absence of liver lesions in the NAPD treatment group, histological changes alone following a 90- day exposure to these known carcinogens- were not predictive of tumor formation observed in a two-year bioassay. This result is consistent with a previously study that reported the poor predictive properties of histological lesions (Allen et al. 2004).
To identify potential transcriptional biomarkers that may be more predictive of results from a two-year rodent bioassay, gross statistical comparisons were performed between animals treated with chemicals positive in a two-year bioassay (NAPD and BFUR) and animals treated with chemicals negative in a two-year bioassay plus the vehicle controls (NEDD, PCNB, FCON, and CCON). In the lung, 187 genes were significantly upregulated in the carcinogenic chemicals and 23 genes were significantly downregulated. In the liver, 464 genes were significantly upregulated in the carcinogenic chemicals and 101 genes were significantly downregulated. A total of 33 altered genes were shared between the lung and liver. The gene expression differences between the carcinogenic chemicals and the non-carcinogenic chemicals are depicted in Figures 6A and 6B. A subset of the significant gene expression changes were also verified using quantitative RT-PCR (Figures 8 and 9).
Based on the general statistical comparison, there were a number of highly discriminating gene expression changes that were shared among the carcinogenic chemicals despite the diversity in genotoxicity status and tumor types observed in the two year cancer bioassay. For genotoxicity, NAPD was positive in the Ames test while BFUR was negative (Table 6). For differences in tumor type, NAPD caused a significant increase in hepatocellular carcinomas, hepatocellular adenomas, and the combined count of alveolar/bronchiolar adenomas and carcinomas (NTP 1978). In contrast, BFUR caused only a significant increase in hepatocellular adenomas and alveolar/bronchiolar adenomas (NTP 1989). The common gene expression changes among the carcinogenic chemicals despite these differences suggests that not all of the gene expression changes following a 90-day exposure are related to a specific chemical mechanism or a specific path to tumorigenesis. Rather, the gene expression alterations shared between the carcinogenic chemicals may reflect some common cellular changes related to the underlying carcinogenic process. Given that many nongenotoxic carcinogens cause a transitory increase in cell proliferation followed by a period of negative selection (Andersen et al. 1995), performing these studies following shorter exposures would likely identify biomarkers that are less robust with a transient predictive window. In addition, biomarkers from shorter exposures would probably be more chemical specific rather than those that reflect the underlying molecular changes in the carcinogenic process.
A comparison of the genes identified in this study with those identified in a previous study of nongenotoxic carcinogens in the rat liver (Nie et al. 2006) showed some common changes. In the liver, nine genes were found in common including aldehyde dehydrogenase 1a1 (Aldh1a1), aldehyde dehydrogenase 1a7 (Aldh1a7), complement component 9 (C9), peripheral benzodiazepine receptor (Bzrp), early growth response 1 (Egr1), microsomal epoxide hydrolase 1 (Ephxi), glutathione S-transferase μ1 (Gstml), chaperonin 10 (Hspei), and transketolase (Tkt). No similar studies were found for comparing the gene expression changes in the lung.
A GO analysis of significant gene expression changes showed enrichment in multiple categories (Table 7). In both the lung and liver, enrichment was observed in glutathione metabolism and biosynthesis. These changes in glutathione-related processes are consistent with a variety of known biomarkers in both rodent and human tumorigenesis (Balendiran et al. 2004). Other significant GO categories in the lung were related to nitric oxide signaling, fatty acid oxidation, electron transport, and a variety of xenobiotic and endogenous metabolic processes. In the liver, significant enrichment was observed in vitamin metabolism, cytoskeletal processes, ion homeostasis, carboxylic acid metabolism, and complement activation. Apart from indicating changes in biological processes, the GO analysis identified a total of 48 genes in the liver and 33 genes in the lung classified as 'extracellular' that have potential to be non-invasive biomarkers.
For the serum metabonomics data, only one NMR spectral bin (δ 3.00- 3.06) showed statistically significant changes between animals treated with the carcinogenic chemicals and animals treated with the noncarcinogenic chemicals plus the vehicle controls (Figure 6D; Table 9). The spectral bin was significantly reduced in the carcinogen-treated samples and was assigned to both creatine and oxidized glutathione (GSSG). These assignments were supported by the 1H-13C gHSQC spectra. Notably, a decrease in GSSG is consistent with the gene expression results that showed a -significant upregulation of glutathione reductase in both the lung and the liver~of carcinogen-treated animals. Other metabolites showed treatment specific changes, but few discriminating markers were consistently altered among the carcinogenic treatments.
A statistical classification analysis demonstrated that the tissue gene expression profiles are capable of predicting tumor formation observed in a two- year bioassay with 100% accuracy and when the number of genes used in the model was less than 5,000 (Figure 7A). As more genes were added, the predictive accuracy declined. The decline in the predictive accuracy with increasing gene numbers has been reported previously and is due to the addition of genes that are treatment specific and not related to the predicted toxicological endpoint (Thomas et al. 2001). The top five potential gene expression biomarkers based on selection by the statistical classification model are listed in Figure 7B. In the lung, two xenobiotic metabolizing enzymes (Gstml and Ephxi), an enzyme involved in cholesterol estehfication (Ces1), a key kinase involved in NFKB signaling (Ikbkg), and a gene involved in the degradation of medium-chain fatty acids (Acsmi) were among the most predictive genes. Although multiple studies have examined the relationships between Gstmi and Ephxi polymorphisms and lung cancer, few studies have examined the relationship with respect to gene expression. Those that have examined expression changes in lung tumors were generally negative for an association (e.g., (Coller et a/. 2001 ; Spivack et al. 2003)). No information was found on the expression of the remaining genes in lung tumors. In the liver, three of the top five gene expression biomarkers had no known function (E130013N09Rik, 4922503N01 Rik, and AI427122). The remaining two genes were a serine protease inhibitor (Itihi) and an enzyme involved in the conversion of glucose to glucuronate (Ugdh). No reports of increased expression of these genes have been reported in liver tumors.
In contrast to the gene expression measurements, the statistical classification analysis of the NMR spectral bins showed relatively low predictive accuracy with few metabolites in the model and increasing accuracy as more bins were added. With all bins in the model, the predictive accuracy was 94% with a sensitivity and specificity of 100% and 83%, respectively. Efforts were made to remove all chemical specific metabolites so that only changes in the endogenous metabolites were used in the analysis. These results suggest that individual endogenous metabolites make relatively poor biomarkers, but the metabolite profile as a whole is altered following carcinogenic treatment and may accurately predict the two-year bioassay results. Given the chemicals used in this study produce both lung and liver tumors, it is unknown what changes in the serum metabolite profile are attributed to each target organ.
The primary purpose of this study was to compare and contrast transcriptomic and metabonomic technologies for identifying biomarkers that can predict a two year rodent cancer bioassay. The results of the study demonstrate that both transcriptional and metabonomic biomarkers collected following a subchronic exposure to a chemical have the potential to predict liver and lung tumor formation observed in a two-year rodent bioassay. The gene expression biomarkers appear to be more accurate than the serum metabolite markers. Example 3
Application and Analyses of Genomic Data for Chemical Risk Assessment
Experimental Methods Treatment groups are shown in Table 10. Of the chemicals used in this study, 18 were previously tested by the NTP. Trifluoroethane was not evaluated by the NTP and the bioassay was performed as described by Alexander et al. (1995) Hum. Exp. Toxicol. 14:706. Nine of the chemicals were positive for an increased incidence of primary alveolar/bronchiolar adenomas or carcinomas and ten were negative. Study results for each of these chemicals for lung tumor incidence are summarized in Table 11 and results for liver tumor incidence are summarized in Table 12. Genotoxic diversity among chemical treatments are summarized in Table 13.
Animal exposures for each chemical were performed via the route and dose listed in Table 9. 5-6 week old female B6C3F1 mice were exposed for 13 weeks. Following 13 weeks of exposure, the mice were euthanized, histopathology on the left lung lobes and left liver lobes were assessed and RNA isolated from the right lung lobes and right, caudate and median liver lobes for microarray analysis. Microarray analysis was performed as described in Example 1 on 3 to 4 animals using Affymetrix 430 2.0 arrays.
Results
Structural Diversity Among Chemical Treatments. The structural diversity among the chemicals was analyzed using a Tanimoto similarity coefficient with a coefficient of 1.0 being identical molecules and 0.0 having no structural similarity. The average similarity among all chemicals in the study was 0.116 with a maximum similarity of 0.508 between NEDD and NAPD (Table 14). Among the lung carcinogens, the average similarity dropped to 0.123 with a maximum similarity of 0.327 between DBET and BBMP. By comparison, the average similarity for all single chemicals tested by the NTP in a rodent cancer bioassay was 0.155.
Histopathology Results. Gross histological changes in the lung were observed in only one chemical. NAPD produced karyomegaly and karyorrhexis in bronchiolar epithelial cells and occasional peribronchiolar infiltration by neutrophils and mononuclear cells. Gross histological changes in the liver were observed in only one chemical. BFUR produced minor cell necrosis.
Building Classification Model and Identifying Relevant Biomarkers.
Classification analysis was performed and the process is outlined in Figure 10. Genomic data for chemical treatments and controls from 106 total animals in 23 treatment groups were part of the analysis. Following MAS5 normalization and averaging animals within each treatment group, the leave one out (LOO) cross- validation process consisted of setting aside one treatment group as the test set and grouping together the remaining (22) treatment groups for use as a training set. Feature selection was performed on the training set using the Golub algorithm. Those with the largest Golub statistic were used to build a support vector machine (SVM) classification model. The model was then used to predict the classes for the test data set. The cross-validation process was repeated 23 times. Figure 11 depicts the predictive accuracy of the lung gene biomarkers.
Table 15 lists the top lung gene expression biomarkers identified. Figure 12 depicts the predictive accuracy of liver gene biomarkers. Figure 13 depicts the most discriminating liver gene expression biomarkers. Figure 14 depicts the predictive accuracy of liver gene expression biomarkers using various classification algorithms.
In conclusion, gene expression biomarkers collected following a subchronic exposure can predict increased tumor incidence in a two-year bioassay with reasonable accuracy using the analysis methods described herein.
References
Allen, D. G., Pearse, G., Haseman, J. K., and Maronpot, R. R. (2004). Prediction of rodent carcinogenesis: an evaluation of prechronic liver lesions as forecasters of liver tumors in NTP carcinogenicity studies. Toxicol Pathol 32, 393-401. Amin, R. P., Vickers, A. E., Sistare, F., Thompson, K. L., Roman, R. J., Lawton, M., Kramer, J., Hamadeh, H. K., Collins, J., Grissom, S., Bennett, L., Tucker, C. J., Wild, S., Kind, C, Oreffo, V., Davis, J. W., 2nd, Curtiss, S., Naciff, J. M.,
Cunningham, M., Tennant, R., Stevens, J., Car, B., Bertram, T. A., and Afshari, C. A. (2004). Identification of putative gene based markers of renal toxicity. Environ Health Perspect 112, 465-79.
Andersen, M. E., Mills, J. J., Jirtle, R. L, and Greenlee, W. F. (1995). Negative selection in hepatic tumor promotion in relation to cancer risk assessment. Toxicology 102, 223-37.
Arand, M., Cronin, A., Adamska, M., and Oesch, F. (2005). Epoxide hydrolases: structure, function, mechanism, and assay. Methods Enzymol 400, 569-88.
Balendiran, G. K., Dabur, R., and Fraser, D. (2004). The role of glutathione in cancer. Cell Biochem Fund 22, 343-52.
Boηgers, V., Snow, G. B., de Vries, N., Cattan, A. R., Hall, A. G., van der Waal, I., and Braakhuis, B. J. (1995). Second primary head and neck squamous cell carcinoma predicted by the glutathione S-transferase expression in healthy tissue in the direct vicinity of the first tumor. Lab Invest 73, 503-10.
Breuhahn, K., Longerich, T., and Schirmacher, P. (2006). Dysregulation of growth factor signaling in human hepatocellular carcinoma. Oncogene 25, 3787-800.
Bucher, J. R., and Portier, C. (2004). Human carcinogenic risk evaluation, Part V: The national toxicology program vision for assessing the human carcinogenic hazard of chemicals. Toxicol Sci 82, 363-6.
Buturovic, L. J. (2006). PCP: a program for supervised classification of gene expression profiles. Bioinformatics 22, 245-7.
Coller, J. K., Fritz, P., Zanger, U. M., Siegle, I., Eichelbaum, M., Kroemer, H. K., and Murdter, T. E. (2001). Distribution of microsomal epoxide hydrolase in humans: an immunohistochemical study in normal tissues, and benign and malignant tumours. Histochem J 33, 329-36.
Costello, L. C, and Franklin, R. B. (2005). Why do tumour cells glycolyse?': from glycolysis through citrate to lipogenesis. MoI Cell Biochem 280, 1-8. Datta, S., and Datta, M. W. (2006). Sonic Hedgehog signaling in advanced prostate cancer. Cell MoI Life Sc/ 63, 435-48.
De Langhe, S. P., Carraro, G., Warburton, D., Hajihosseini, M. K., and Bellusci, S. (2006). Levels of mesenchymal FGFR2 signaling modulate smooth muscle progenitor cell commitment in the lung. Dev Biol 299, 52-62. De Moerlooze, L., Spencer-Dene, B., Revest, J., Hajihosseini, M., Rosewell, I., and Dickson, C. (2000). An important role for the IMb isoform of fibroblast growth factor receptor 2 (FGFR2) in mesenchymal-epithelial signalling during mouse organogenesis. Development 127 ', 483-92. del Moral, P. M., De Langhe, S. P., SaIa1 F. G., Veltmaat, J. M., Tefft, D., Wang, K., Warburton, D., and Bellusci, S. (2006). Differential role of FGF9 on epithelium and mesenchyme in mouse embryonic lung. Dev Biol 293, 77-89.
Dennis, G., Jr., Sherman, B. T., Hosack, D. A., Yang, J., Gao, W., Lane, H. C1 ^ and Lempicki, R. A. (2003). DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4, P3. Diez de Medina, S. G., Chopin, D., El Marjou, A., Delouvee, A., LaRochelle, W. J., Hoznek, A., Abbou, C1 Aaronson, S. A., Thiery, J. P., and Radvanyi, F. (1997). Decreased expression of keratinocyte growth factor receptor in a subset of human transitional cell bladder carcinomas. Oncogene 14, 323-30.
Ellinger-Ziegelbauer, H., Stuart, B., Wahle, B., Bomann, W., and Ahr, H. J. (2005). Comparison of the expression profiles induced by genotoxic and nongenotoxic carcinogens in rat liver. Mutat Res 575, 61-84.
Fielden, M. R., Eynon, B. P., Natsoulis, G., Jamagin, K., Banas, D., and Kolaja, K. L. (2005). A gene expression signature that predicts the future onset of drug- induced renal tubular toxicity. Toxicol Pathol 33, 675-83. Eswarakumar, V. P., Lax, I., and Schlessinger, J. (2005). Cellular signaling by fibroblast growth factor receptors. Cytokine Growth Factor Rev 16, 139-49. Eswarakumar, V. P., Monsonego-Oman, E., Pines, M., Antonopoulou, I., Morriss- Kay, G. M., and Lonai, P. (2002). The NIc alternative of Fgfr2 is a positive regulator of bone formation. Development 129, 3783-93.
Ghosh, S. (2000). Cholesteryl ester hydrolase in human monocyte/macrophage: cloning, sequencing, and expression of full-length cDNA. Physiol Genomics 2, 1- 8.
Glinsky, G. V., Berezovska, O., and Glinskii, A. B. (2005). Microarray analysis identifies a death-from-cancer signature predicting therapy failure in patients with multiple types of cancer. J Clin Invest 115, 1503-21. Gold, L. S., Manley, N. B., Slone, T. H., and Rohrbach, L. (1999). Supplement to the Carcinogenic Potency Database (CPDB): results of animal bioassays published in the general literature in 1993 to 1994 and by the National Toxicology Program in 1995 to 1996. Environ Health Perspect 107 Suppl 4, 527-600.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C, Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-7.
Griffin, M. J., and Gengozian, N. (1984). Epoxide hydrolase: a marker for experimental hepatocarcinogenesis. Ann CHn Lab Sci 14, 27-31. Hasegawa, R., and Ito, N. (1994). Hepatocarcinogenesis in the rat. In Carcinogenesis (M. P. Waalkes and J. M. Ward, eds.). Raven Press, New York.
Hatayama, I., Nishimura, S., Narita, T., and Sato, K. (1993). Sex-dependent expression of class pi glutathione S-transferase during chemical hepatocarcinogenesis in B6C3F1 mice. Carcinogenesis 14, 537-8. Hayes, J. D., Flanagan, J. U., and Jowsey, I. R. (2005). Glutathione transferases. Annu Rev Pharmacol Toxicol 45, 51-88.
Hayes, J. D., and Pulford, D. J. (1995). The glutathione S-transferase supergene family: regulation of GST and the contribution of the isoenzymes to cancer chemoprotection and drug resistance. Crit Rev Biochem MoI Biol 30, 445-600. Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P. (2003). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31 , e15.
Kessler, R., Hamou, M. F., Albertoni, M., de Tribolet, N., Arand, M., and Van Meir, E. G. (2000). Identification of the putative brain tumor antigen BF7/GE2 as the (de)toxifying enzyme microsomal epoxide hydrolase. Cancer Res 60, 1403-9.
Kwak, M. K., Wakabayashi, N., and Kensler, T. W. (2004). Chemoprevention through the Keap1-Nrf2 signaling pathway by phase 2 enzyme inducers. Mutat Res 555, 133-48. Kramer, J. A., Curtiss, S. W., Kolaja, K. L1 Alden, C. L, Blomme, E. A., Curtiss, W. C1 Davila, J. C, Jackson, C. J., and Bunch, R. T. (2004). Acute molecular markers of rodent hepatic carcinogenesis identified by transcription profiling. Chem Res Toxicol 17, 463-70.
Lindahl, R. (1992). Aldehyde dehydrogenases and their role in carcinogenesis. Crit Rev Biochem MoI Biol 27 , 283-335.
Matsunobu, T., Ishiwata, T., Yoshino, M., Watanabe, M., Kudo, M., Matsumoto, K., Tokunaga, A., Tajiri, T., and Naito, Z. (2006). Expression of keratinocyte growth factor receptor correlates with expansive growth and early stage of gastric cancer, lnt J Oncol 28, 307-14. Mazurek, S., Boschek, C. B., and Eigenbrodt, E. (1997). The role of phosphometabolites in cell proliferation, energy metabolism, and tumor therapy. J Bioenerg Biomembr 29, 315-30.
Munger, J. S., Shi, G. P., Mark, E. A., Chin, D. T., Gerard, C1 and Chapman, H. A. (1991). A serine esterase released by human alveolar macrophages is closely related to liver microsomal carboxylesterases. J Biol Chem 266, 18832-8.
Nie, A. Y., McMillian, M., Brandon Parker, J., Leone, A., Bryant, S., Yieh, L., Bittner, A., Nelson, J., Carmen, A., Wan, J., and Lord, P. G. (2006). Predictive toxicogenomics approaches reveal underlying molecular mechanisms of nongenotoxic carcinogenicity. MoI Carcinog. Novikoff, A. B., Novikoff, P. M., Stockert, R. J., Becker, F. F., Yam, A., Poruchynsky, M. S., Levin, W., and Thomas, P. E. (1979). lmmunocytochemical localization of epoxide hydrase in hyperplastic nodules induced in rat liver by 2- acetylaminofluorene. Proc Natl Acad Sci U S A 76, 5207-11.
NTP (1978). Bioassay of 1 ,5-Naphthalenediamine for Possible Carcinogenicity. U.S. Department of Health and Human Services National Toxicology Program, Washington, D. C.
NTP (1989). Toxicology and Carcinogenesis Studies of Benzofuran in F344/N Rats and B6C3F1 Mice. U.S. Department of Health and Human Services National Toxicology Program, Washington, D.C.
NTP (1996). Annual Plan for Fiscal Year 1996. National Toxicology Program, Washington, D.C.
NTP (2001). Annual Plan for Fiscal Year 2001. National Toxicology Program, Washington, D.C.
Pritchard, J. B., French, J. E., Davis, B. J., and Haseman, J. K. (2003). The role of transgenic mouse models in carcinogen identification. Environ Health Perspect 111, 444-54.
Reiner, A., Yekutieli, D., and Benjamini, Y. (2003). Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19, 368-75.
Richard, A. M., Gold, L. S., and Nicklaus, M. C. (2006). Chemical structure indexing of toxicity data on the internet: moving toward a flat world. Curr Opin Drug Discov Devel 9, 314-25.
Smyth, G. K. (2005). Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions using R and Bioconductor (R. Gentleman, V. Carey, S. Dudoit, R. A. Irizarry and W. Huber, eds.). Springer, New York. Spivack, S. D., Hurteau, G. J., Fasco, M. J., and Kaminsky, L. S. (2003). Phase I and Il carcinogen metabolism gene expression in human lung tissue and tumors. CHn Cancer Res 9, 6002-11.
Srinivasan, D. M., Kapoor, M., Kojima, F., and Crofford, L. J. (2005). Growth factor receptors: implications in tumor biology. Curr Opin Investig Drugs 6, 1246-9. Strassburg, C. P., Manns, M. P., and Tukey, R. H. (1997). Differential down- regulation of the UDP-glucuronosyltransferase 1A locus is an early event in human liver and biliary cancer. Cancer Res 57, 2979-85.
Su, A. I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K. A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., Cooke, M. P., Walker, J. R., and Hogenesch, J. B. (2004). A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 101, 6062-7.
Thomas, R. S., O'Connell, T. M., Pluta, L, Wolfinger, R. D., Yang, L., and Page, T. J. (2006). A Comparison of Transcriptomic and Metabonomic Technologies for Identifying Biomarkers Predictive of Two- Year Rodent Cancer Bioassays. Toxicol Sci [Epub ahead of print].
Thomas, R. S., Rank, D. R., Penn, S. G., Zastrow, G. M., Hayes, K. R., Pande, K., Glover, E., Silander, T., Craven, M. W., Reddy, J. K., Jovanovich, S. B., and Bradfield, C. A. (2001). Identification of toxicologically predictive gene sets using cDNA microarrays. MoI Pharmacol 60, 1189-94.
Tukey, R. H., and Strassburg, C. P. (2000). Human UDP- glucuronosyltransferases: metabolism, expression, and disease. Annu Rev Pharmacol Toxicol 40, 581-616.
Uphoff, C. C1 and Drexler, H. G. (2000). Biology of monocyte-specific esterase. Leuk Lymphoma 39, 257-70.
Waring, J. F., Jolly, R. A., Ciurlionis, R., Lum, P. Y., Praestgaard, J. T., Morfitt, D. C, Buratto, B., Roberts, C, Schadt, E., and Ulrich, R. G. (2001). Clustering of hepatotoxins based on mechanism of toxicity using gene expression profiles. Toxicol Appl Pharmacol 175, 28-42. Watanabe, M., Ishiwata, T., Nishigai, K., Moriyama, Y., and Asano, G. (2000). Overexpression of keratinocyte growth factor in cancer cells and enterochromaffin cells in human colorectal cancer. Pathol lnt 50, 363-72.
White, A. C, Xu, J., Yin, Y., Smith, C, Schmid, G., and Ornitz, D. M. (2006). FGF9 and SHH signaling coordinate lung growth and development through regulation of distinct mesenchymal domains. Development 133, 1507-17. Yamayoshi, T., Nagayasu, T., Matsumoto, K., Abo, T., Hishikawa, Y., and Koji, T. (2004). Expression of keratinocyte growth factor/fibroblast growth factor-7 and its receptor in human lung cancer: correlation with tumour proliferative activity and patient prognosis. J Pathol 204, 110-8. Yeh, C. S., Wang, J. Y., Cheng, T. L, Juan, C. H., Wu1 C. H., and Lin, S. R. (2006). Fatty acid metabolism pathway play an important role in carcinogenesis of human colorectal cancers by Microarray-Bioinformatics analysis. Cancer Lett 233, 297-308.
Zhang, T., Haws, P., and Wu, Q. (2004). Multiple variable first exons: a mechanism for cell- and tissue-specific gene regulation. Genome Res 14, 79-89.
Zhang, B., Schmoyer, D., Kirov, S., and Snoddy, J. (2004). GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics 5, 16.
The foregoing is illustrative of the present invention, and is not to be construed as limiting thereof. The invention is defined by the following claims, with equivalents of the claims to be included therein.

Claims

That Which is Claimed is:
1. A method of predicting tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, comprising:
(a) determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample comprising at least one nucleic acid sequence isolated from a biological sample taken from a subject, wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a polypeptide encoded by nucleic acid sequence of SEQ ID NO:6 (AU018778), a glutathione S-transferase, and fragments, variants and isoforms thereof; and
(b) correlating an altered level of nucleic acid expression of the at least one nucleic acid sequence to a likelihood of tumor formation.
2. The method of claim 1 , wherein the method provides at least about 85% accuracy of predicting tumor formation.
3. The method of claim 1 , wherein the method provides at least about 90% accuracy of predicting tumor formation.
4. The method of claim 1 , wherein the method provides at least about 93% accuracy, 90% sensitivity and 90% specificity of predicting tumor formation.
5. The method of claim 1 , wherein the method comprises determining the nucleic acid expression pattern of eight nucleic acid sequences, wherein the nucleic acid sequences comprise SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO:11 (Ugt1a5), SEQ ID NO:12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a9), SEQ ID NO:16 (Ugt1a10) and variants and isoforms thereof.
6. The method of claim 1 , wherein the method comprises measuring the levels of RNA.
7. The method of claim 1 , wherein the method comprises measuring the levels of protein.
8. The method of claim 1, wherein the nucleic acid expression is tissue specific.
9. The method of claim 1 , wherein the tumor is selected from the group consisting of breast cancer, osteosarcoma, angiosarcoma, fibrosarcoma, leukemia, sinus tumor, ovarian cancer, uretal cancer, bladder cancer, prostate cancer, genitourinary cancer, gastrointestinal cancer, lung cancer, lymphoma, myeloma, pancreatic cancer, liver cancer, kidney cancer, endocrine cancer, skin cancer, melanoma, angioma and brain or central nervous system (CNS) cancer.
10. A method of predicting lung or liver tumor formation comparable to results obtained in a standard two-year rodent cancer bioassay, comprising:
(a) determining the nucleic acid expression pattern of eight nucleic acid sequences comprising SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO:11 (Ugt1a5), SEQ ID NO:12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a9), SEQ ID NO:16 (Ugt1a10) and variants and isoforms thereof; and
(b) correlating an altered level of nucleic acid expression of the nucleic acid sequences to a likelihood of tumor formation to predict lung or liver tumor formation with at least about 94% accuracy.
11. The method of claim 10, further comprising measuring the levels of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a polypeptide encoded by a nucleic acid sequence of SEQ ID NO:6 (AU018778) and/or a glutathione S-transferase and variants and isoforms thereof.
12. A method of assessing a substance for carcinogenic potential, comprising:
(a) determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample comprising at least one nucleic acid isolated from a biological sample taken from a subject exposed to a substance to be tested for carcinogenicity, wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778) and/or a glutathione S- transferase and variants and isoforms thereof; and
(b) correlating an altered level of nucleic acid expression of the at least one nucleic acid sequence to an increased likelihood of tumor formation, wherein an increased likelihood of tumor formation indicates that the substance has carcinogenic potential.
13. The method of claim 12, wherein the method comprises determining the nucleic acid expression pattern of eight nucleic acid sequences, wherein the nucleic acid sequences comprise SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstml), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO:11 (Ugt1a5), SEQ ID NO:12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a9), SEQ ID NO:16 (Ugt1a10) and variants and isoforms thereof.
14. The method of claim 12, wherein the exposure to the substance is a subchronic exposure.
15. A method of using a nucleic acid biomarker to predict tumor formation, comprising: determining the nucleic acid expression pattern of at least one nucleic acid sequence from a sample comprising a nucleic acid isolated from a biological sample taken from a subject, wherein the nucleic acid sequence encodes a polypeptide selected from the group consisting of a UDP glucuronosyltransferase, a carboxylesterase, a fibroblast growth factor receptor, an epoxide hydrolase, a nucleotide sequence of SEQ ID NO:6 (AU018778), a glutathione S-transferase and variants and isoforms thereof.
16. A method of identifying a nucleic acid biomarker for predicting tumor formation resulting from exposure to a substance, comprising:
(a) comparing nucleic acid expression of at least one nucleic acid sequence taken from a biological sample of (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound,
(b) identifying at least one nucleic acid sequence having expression differentially regulated after exposure to the cytotoxic compound as compared to after exposure to a non-cytotoxic compound; and
(c) statistically correlating the differential regulation to a likelihood of tumor formation thereby indicating that the at least one nucleic acid is a , biomarker for-predicting tumor formation resulting from exposure to a substance.
17. A method of determining a nucleic acid expression profile to predict tumor formation, comprising:
(a) performing a microarray analysis on at least one nucleic acid isolated from a biological sample taken from (i) at least one subject exposed to a cytotoxic compound, and (ii) at least one other subject exposed to a non-cytotoxic compound; and
(b) statistically analyzing the ability of the expression of at least one nucleic acid to be regulated differently during cytotoxic and non-cytotoxic treatments, wherein the differential expression of the at least one nucleic acid establishes a nucleic acid profile to predict tumor formation.
18. A kit comprising a probe that hybridizes with a nucleic acid sequence comprising SEQ ID NO:1 (Ugt1a1), SEQ ID NO:2 (Ces1), SEQ ID NO:3 ( Fgfr2), SEQ ID NO:4 (Ephxi), SEQ ID NO:5 (Ugt1a2), SEQ ID NO:6 (AU018778), SEQ ID NO:7 (Gstmi), SEQ ID NO:8 (Ddit4l), SEQ ID NO:9 (Ikbkg transcript variant 1), SEQ ID NO:10 (Ikbkg transcript variant 2), SEQ ID NO:11 (Ugt1a5), SEQ ID NO:12 (Ugt1a6a), SEQ ID NO:13 (Ugt1a6b), SEQ ID NO:14 (Ugt1a7c), SEQ ID NO:15 (Ugt1a9), SEQ ID NO:16 (Ugt1a10) and variants and isoforms thereof, under conditions whereby nucleic acid hybridization can occur.
19. The kit of claim 18, wherein the kit further includes instructions for predicting tumor formation resulting from exposure to a substance.
20. The kit of claim 19, wherein the tumor is selected from the group consisting of breast cancer, osteosarcoma, angiosarcoma, fibrosarcoma, leukemia, sinus tumor, ovarian cancer, uretal cancer, bladder cancer, prostate cancer, genitourinary cancer, gastrointestinal cancer, lung cancer, lymphoma, myeloma, pancreatic cancer, liver cancer, kidney cancer, endocrine cancer, skin cancer, melanoma, angioma and brain or central nervous system (CNS) cancer.
21. The kit of claim 18, wherein the kit further includes instructions for evaluating a chemical for carcinogenic potential.
22. The kit of claim 18, wherein the kit further includes instructions for assaying a biological sample for the presence of tumor formation biomarkers.
PCT/US2008/003063 2007-03-08 2008-03-07 Methods of using genomic biomarkers to predict tumor formation WO2008112154A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US90609907P 2007-03-08 2007-03-08
US60/906,099 2007-03-08

Publications (2)

Publication Number Publication Date
WO2008112154A2 true WO2008112154A2 (en) 2008-09-18
WO2008112154A3 WO2008112154A3 (en) 2008-12-24

Family

ID=39760274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/003063 WO2008112154A2 (en) 2007-03-08 2008-03-07 Methods of using genomic biomarkers to predict tumor formation

Country Status (1)

Country Link
WO (1) WO2008112154A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110729022A (en) * 2019-10-24 2020-01-24 江西中烟工业有限责任公司 Establishment method of passive smoking rat early liver injury model and related gene screening method
CN115290774A (en) * 2022-07-21 2022-11-04 重庆医科大学 Application of uridine diphosphate glucuronic acid in preparation of reagent for detecting liver cancer

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030228667A1 (en) * 1999-02-22 2003-12-11 Institut Pasteur Nucleotide sequence encoding a modulator of NF-kappaB
US20060240450A1 (en) * 2001-09-19 2006-10-26 David Ralph Genetic analysis for stratification of cancer risk

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030228667A1 (en) * 1999-02-22 2003-12-11 Institut Pasteur Nucleotide sequence encoding a modulator of NF-kappaB
US20060240450A1 (en) * 2001-09-19 2006-10-26 David Ralph Genetic analysis for stratification of cancer risk

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ARAKI ET AL.: 'Polymorphisms of UDP-glucuronosyltransferase IA7 gene. A Possible New Risk Factor for Lung Cancer' EUR. J. CANCER vol. 41, 2005, pages 2360 - 2365 *
COTE ET AL.: 'Combination of Glutathione S-Transferase Genotypes and Risk of Early-Onset Lung Cancer in Caucasians and African Americans: A Population-Based Study' CARCINOGENESIS vol. 26, 2005, pages 811 - 819 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110729022A (en) * 2019-10-24 2020-01-24 江西中烟工业有限责任公司 Establishment method of passive smoking rat early liver injury model and related gene screening method
CN110729022B (en) * 2019-10-24 2023-06-23 江西中烟工业有限责任公司 Method for establishing early liver injury model of passive smoke-absorbing rat and related gene screening method
CN115290774A (en) * 2022-07-21 2022-11-04 重庆医科大学 Application of uridine diphosphate glucuronic acid in preparation of reagent for detecting liver cancer

Also Published As

Publication number Publication date
WO2008112154A3 (en) 2008-12-24

Similar Documents

Publication Publication Date Title
JP6067686B2 (en) Molecular diagnostic tests for cancer
AU2009257410B2 (en) Use of miR-26 family as a predictive marker of hepatocellular carcinoma and responsiveness to therapy
Hofman et al. Gene expression profiling in human gastric mucosa infected with Helicobacter pylori
Campone et al. Prediction of metastatic relapse in node-positive breast cancer: establishment of a clinicogenomic model after FEC100 adjuvant regimen
JP2009543552A5 (en)
Thomas et al. Application of genomic biomarkers to predict increased lung tumor incidence in 2-year rodent cancer bioassays
AU2012261820A1 (en) Molecular diagnostic test for cancer
Kim et al. Comparative analysis of AhR-mediated TCDD-elicited gene expression in human liver adult stem cells
US20160348182A1 (en) Methods and kits for the diagnosis and treatment of pancreatic cancer
WO2008116178A2 (en) Systems and methods for diagnosis and prognosis of colorectal cancer
Gong et al. TNPO2 operates downstream of DYNC1I1 and promotes gastric cancer cell proliferation and inhibits apoptosis
WO2013158722A1 (en) Diagnosis of lymph node involvement in rectal cancer
US20120201750A1 (en) Serum biomarkers for melanoma metastasis
WO2008112154A2 (en) Methods of using genomic biomarkers to predict tumor formation
US11680297B2 (en) Activities of multiple cancer-related pathways are associated with BRAF mutation and predict the resistance to BRAF/MEK inhibitors in melanoma cells
WO2007053659A2 (en) Method of screening for hepatocellular carcinoma
AU2019276749A1 (en) L1TD1 as predictive biomarker of colon cancer
EP2547789B1 (en) Methods and uses relating to the identification of compound involved in pain as well as methods of diagnosing algesia
US20150011411A1 (en) Biomarkers of cancer
KR20150081631A (en) Biomarker for predicting and diagnosing drug-induced liver injury using transcriptomics and proteomics
CN101356184A (en) Methods for assessing patients with acute myeloid leukemia
KR20150081632A (en) Method for providing information about early diagnosis of drug-induced liver injury type
CN113930504A (en) Application of G protein coupled receptor LPAR6 in liver cancer prognosis
CN113933510A (en) Use of the G-protein coupled receptor LPAR6 in the prognosis of lung cancer
US20120245044A1 (en) Methods of determining chemotherapy response in cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08726574

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08726574

Country of ref document: EP

Kind code of ref document: A2