WO2004083402A2 - Spleen necrosis predictive genes - Google Patents

Spleen necrosis predictive genes Download PDF

Info

Publication number
WO2004083402A2
WO2004083402A2 PCT/US2004/008371 US2004008371W WO2004083402A2 WO 2004083402 A2 WO2004083402 A2 WO 2004083402A2 US 2004008371 W US2004008371 W US 2004008371W WO 2004083402 A2 WO2004083402 A2 WO 2004083402A2
Authority
WO
WIPO (PCT)
Prior art keywords
genes
predictive
sample
gene
spleen necrosis
Prior art date
Application number
PCT/US2004/008371
Other languages
French (fr)
Other versions
WO2004083402A3 (en
Inventor
Usha Sankar
Larry Kier
Maher Derbel
Timothy Nolan
Original Assignee
Phase-1 Molecular Toxicology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Phase-1 Molecular Toxicology, Inc. filed Critical Phase-1 Molecular Toxicology, Inc.
Publication of WO2004083402A2 publication Critical patent/WO2004083402A2/en
Publication of WO2004083402A3 publication Critical patent/WO2004083402A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • Table 20 is recorded on said CD- ROM discs as "Table20.txt” created on March 11 , 2003 size 603,657 bytes.
  • Table 21 is recorded on said CD-ROM discs as "Table21.txt” created on March 11 , 2003 size 653,883 bytes.
  • Sequence listing as requested by 37 CFR ⁇ 1.821 (c) is recorded on said CD-ROM disc as "2874-021 P.genesequence.txt created on March 15, 2004.
  • This invention is in the field of toxicology. More specifically, it relates to spleen necrosis predictive genes and the methods of using such genes to predict spleen necrosis.
  • the present invention provides spleen necrosis predictive genes and predictive models which are useful to predict toxic responses to one or more agents.
  • One embodiment of the present invention provides methods of predicting whether an agent induces spleen necrosis in an individual.
  • One method includes the steps of: (a) obtaining a biological sample from an individual treated with the agent or treating a biological sample obtained from an individual with the agent or treating in vitro cultured cells or explants with the agent; (b) obtaining a gene expression profile on one or more of the spleen necrosis predictive genes disclosed herein from the biological sample or in vitro cultured cells or explants; and (c) using the gene expression profile from the biological sample or cells treated with the agent in a predictive model to predict whether the agent induces spleen necrosis in the individual or in vitro.
  • compositions comprising a plurality of cDNAs for use in detecting the altered expression of genes in a toxic response of the spleen, wherein said plurality of cDNAs comprises SEQ ID NOs: 1-306 or the complete complements thereof.
  • Another embodiment of the present invention provides methods of predicting whether an agent induces toxicity to other lymphoid organs such as bone marrow, thymus, or lymph nodes, in an individual.
  • One method includes the steps of: (a) obtaining a biological sample from an individual treated with the agent or treating a biological sample obtained from an individual with the agent or treating in vitro cultured cells or explants with the agent; (b) obtaining a gene expression profile on one or more of the spleen necrosis predictive genes disclosed herein from the biological sample or in vitro cultured cells or explants; and (c) using the gene expression profile from the biological sample or cells treated with the agent in a predictive model to predict whether the agent induces toxicity to other lymphoid organs such as bone marrow, thymus, or lymph nodes, in an individual.
  • the predictive model utilizes gene expression profiles from sets of spleen necrosis predictive gene(s) selected from one of the various spleen necrosis predictive gene sets disclosed herein (i.e., Combination 5, 4, 3, 2, or 1), wherein the sets comprise one or more genes therefrom.
  • the predictive model utilizes sets of spleen necrosis predictive gene(s) selected from one of the various spleen necrosis predictive gene sets disclosed herein (i.e., Combination 5, 4, 3, 2, or 1), wherein the sets comprise one or more genes therefrom.
  • the predictive genes and models are used to identify and evaluate various in vitro systems that can be used to accurately predict in vivo toxicity and to use the identified in vitro systems to accurately predict in vivo toxicity.
  • Yet another embodiment of the present invention provides methods of identifying a spleen necrosis predictive gene including the steps of: (a) providing a set of candidate toxicity predictive genes; (b) evaluating said genes for their predictive performance with at least one training and test set of data in a Predictive Model to identify genes which are predictive of spleen necrosis; and (c) testing the performance of predictive genes for their ability to predict spleen necrosis for: (i) different test sets of data, (ii) comparison of prediction for accurate versus random classification, and (iii) prediction using test data external to the data used to derive the predictive genes.
  • the candidate toxicity predictive genes are rat toxicity genes.
  • Another embodiment of the present invention provides a computer-based method for mining genes predictive for spleen necrosis by: (a) collecting expression levels of a plurality of candidate toxicity predictive genes in a multiplicity of samples; (b) optionally storing the expression levels as a database on an electronic medium; (c) defining a group of samples to be a training set; (d) defining another group of samples to be a test set; (e) optionally generating additional training and test sets; and (f) selecting a set of genes which are predictive of spleen necrosis based on evaluating the training set and the test set in a Predictive Model.
  • a further embodiment of the present invention provides a computer program product for predicting spleen necrosis, which includes a set of spleen necrosis predictive genes derived from mining a database having a plurality of gene expression profiles indicative of toxicity.
  • the set of spleen necrosis predictive genes includes at least one predictive gene from combination 5, 4, 3, 2, or 1 list.
  • Yet a further embodiment of the present invention provides a library of expression profiles of spleen necrosis predictive genes produced by the methods disclosed herein.
  • Another embodiment of the present invention provides an integrated system for predicting spleen necrosis including equipment capable of measuring gene expression profiles of spleen necrosis predictive genes from biological samples exposed to a test agent, operably linked to a computer system capable of implementing a predictive model.
  • Figure 1 illustrates a flow diagram illustrating a method of how spleen necrosis predictive genes are discovered.
  • Figure 2 illustrates a flow diagram illustrating a method of how spleen necrosis predictive genes are evaluated for performance.
  • Figure 3 illustrates a flow diagram illustrating a method of how spleen necrosis predictive genes are used to predict toxicity.
  • Table 1 lists compounds, dose levels, spleen necrosis pathology and abbreviations in the database.
  • Table 2 lists the distribution of compounds in individual training and test sets for 24 hour spleen necrosis data.
  • Table 3 lists the predictive genes for 24 hour expression data.
  • Table 4 lists the randomly selected gene subsets from 24 hour Combo All gene set. Genes were randomly selected from the Combo All list of predictive genes (154 genes) assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
  • Table 5 lists the randomly selected gene subsets from 24 hour Combos 543 combined. Genes were randomly selected from the combined Combo 543 list of predictive genes (23 genes) assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
  • Table 6 lists the randomly selected gene subsets from 24 hour all excluding predictive genes (i.e,. excluding Combo All genes).
  • Table 7 lists the spleen necrosis individual sample prediction values for 24 hour data predictive genes (combined list and subsets).
  • Table 8 lists the individual gene predictions for Combos 5, 4, and 3.
  • Table 9 lists the comparison of predictivity for correct spleen necrosis classification and random classification using Combo gene sets and random subsets and 24 hour data.
  • Table 10 lists the distribution of compounds in individual training and test sets for 6 hour spleen necrosis data.
  • Table 11 lists genes whose expression at 6 hours is predictive of spleen necrosis at 72 hours.
  • Table 12 lists the comparison of predictivity for correct spleen necrosis classification and random classification using combo gene sets and 6 hour data. Random Classification Using Combo Gene Sets and 6h data
  • Table 13 lists the distribution of compounds in individual training and test sets for 72 hour spleen necrosis data.
  • Table 14 lists genes whose expression at 72 hours predicts spleen necrosis at 72 hours.
  • Table 15 lists comparison of predictivity for correct spleen necrosis classification and random classification using combo gene sets 72 hour data.
  • Table 16 lists the RCT genes (ESTs) predictive for spleen necrosis at 72 hours: best homology matches.
  • Table 17 lists the genes predictive for spleen necrosis, sequences, and accession numbers.
  • Table 18 lists the spleen necrosis predictive genes whose protein products are known to be secreted. The genes are from the table listing the spleen necrosis predictive genes at the three time points 6, 24, and 72 hours.
  • Table 19 lists the expression data for the 6 hour timepoint.
  • Table 20 lists the expression data for the 24 hour timepoint.
  • Table 21 lists the expression data for the 72 hour timepoint.
  • This invention relates to methods of predicting whether an agent or other stimulus induces spleen toxicity using predictive molecular toxicology analysis.
  • the present invention provides methods of predicting spleen necrosis which comprise analyzing gene and/or protein expression across a number of spleen necrosis biomarkers disclosed herein for patterns of expression that are predictive of spleen necrosis in the recipient organism.
  • Many chemical agents induce damage to spleen and other lymphoid organs, and immunotoxicity is a significant component of adverse reactions to pharmaceuticals and drugs. Adverse drug reactions are very often unpredictable, and may occur through acute exposure to the chemical agent or drug or through chronic exposures.
  • One embodiment of the present invention provides in part, that modulated transcriptional regulation of relatively small sets of certain genes in response to a test agent can accurately predict the occurrence of spleen necrosis observed at later time points.
  • spleen necrosis biomarkers which are useful in the practice of the spleen toxicity prediction methods of the present invention.
  • applicants have identified 306 spleen necrosis biomarkers which demonstrate utility in predicting spleen necrosis. These biomarkers have been thoroughly characterized for their predictive performance, individually as well as in various combinations or subsets thereof.
  • various optimized subsets of the spleen necrosis biomarkers of the present invention are disclosed. These sets have also been thoroughly characterized for predictive performance using the methods of the present invention.
  • subsets of spleen necrosis genes provided herein are several which demonstrate prediction accuracies in the vicinity of 80%.
  • Toxic or "toxicity” refers to the result of an agent causing adverse effects, usually by a xenobiotic agent administered at a sufficiently high dose level to cause the adverse effects.
  • spleen necrosis refers to depletion of lymphoid tissue in the splenic follicles.
  • spleen necrosis biomarker and “spleen necrosis predictive gene” are used interchangeably and refer to a gene whose expression, measured at the RNA or protein level can predict the likelihood of a spleen necrosis response.
  • a "toxicological response” refers to a cellular, tissue, organ or system level response to exposure to an agent. At the molecular level, this can include, but is not limited to, the differential expression of genes encompassing both the up- and down-regulation of expression of such genes at the RNA and/or protein level; the up- or down-regulation of expression of genes which encode proteins associated with response to and mitigation of damage, the repair or regulation of cell damage; or changes in gene expression due to changes in populations of cells in the tissue or organ affected in response to toxic damage.
  • An "agent” or “compound” is any element to which an individual can be exposed and can include, without limitation, drugs, pharmaceutical compounds, household chemicals, industrial chemicals, environmental chemicals, other chemicals, and physical elements such as electromagnetic radiation.
  • biological sample refers to substances obtained from an individual.
  • the samples may comprise cells, tissue, parts of tissues, organs, parts of organs, or fluids (e.g., blood, urine or serum).
  • Biological samples include, but are not limited to, those of eukaryotic, mammalian or human origin.
  • sample is defined for the purposes of prediction as a biological sample and the gene expression data for that sample. Each sample may come from an individual animal. A toxicity classification may also be associated with the sample.
  • Gene expression refers to the relative levels of expression and/or pattern of expression of a gene.
  • Gene expression profile refers to the levels of expression of multiple different genes measured for the same sample. Gene expression profiles may be measured in a sample, such as samples comprising a variety of cell types, different tissues, different organs, or fluids (e.g., blood, urine, spinal fluid, sweat, saliva or serum) by various methods including but not limited to microarray technologies and quantitative and semi-quantitative RT-PCR (e.g., TaqmanTM) techniques, as well as techniques for measuring expression of proteins.
  • “Individual” refers to a vertebrate, including, but not limited to, a human, non- human primate, mouse, hamster, guinea pig, rabbit, cattle, sheep, pig, chicken, and dog.
  • hybridize As used herein, the terms “hybridize”, “hybridizing”, “hybridizes” and the like, used in the context of polynucleotides, are meant to refer to conventional hybridization conditions, such as hybridization in 50% formamide, 6X SSC, 0.1% SDS, 100 ⁇ g/ml ssDNA, in which temperatures for hybridization are above 37 degrees Celsius and temperatures for washing in 0.1 X SSC/0.1% SDS are above 55 degrees Celsius, and preferably to stringent hybridization conditions.
  • the hybridization of nucleic acids can depend upon various factors such as their degree of complementarity as well as the stringency of the hybridization reaction conditions.
  • Stringent conditions can be used to identify nucleic acid duplexes with a high degree of complementarity.
  • Means for adjusting the stringency of a hybridization reaction are well-known to those of skill in the art. See, for example, Sambrook, et al, "Molecular Cloning: A Laboratory Manual,” Second Edition, Cold Spring Harbor Laboratory Press, 1989; Ausubel, et al., “Current Protocols In Molecular Biology,” John Wiley & Sons, 1996 and periodic updates; and Hames et al, "Nucleic Acid Hybridization: A Practical Approach,” IRL Press, Ltd., 1985.
  • conditions that increase stringency include higher temperature, lower ionic strength and presence or absence of solvents; lower stringency is favored by lower temperature, higher ionic strength, and lower or higher concentrations of solvents.
  • identity is used to express the percentage of amino acid residues at the same relative position which are the same.
  • homology is used to express the percentage of amino acid residues at the same relative positions which are either identical or are similar, using the conserved amino acid criteria of BLAST analysis, as is generally understood in the art. Further details regarding amino acid substitutions, which are considered conservative under such criteria, are provided below.
  • the generation of Toxicology Gene Expression Databases or spleen necrosis biomarkers is described.
  • the spleen necrosis biomarkers described herein were initially identified utilizing a database generated from large numbers of in vivo experiments, wherein the differential expression of approximately 700 rat genes, measured at various time points, in response to multiple toxic compounds inducing various specific toxic responses, as visualized through microscopic histopathological analysis, was quantified, as described in pending United States Patent Application filed January 29, 2002 (serial number 10/060,893) and for purposes of U.S.
  • Such databases are generated using organisms other than the rat, including without limitation, animals of canine, murine, or non-human primate species. In addition, such databases may incorporate data derived from human clinical trials and post-approval human clinical experiences.
  • Various methods for detecting and quantitating the expression of genes and/or proteins in response to toxic stimuli are employed in the generation of such databases, as are generally known in the art. For example, microarrays comprising multiple cDNAs or oligonucleotide probes capable of hybridizing to corresponding transcripts of genes of interest may be used to generate gene expression profiles. Additionally, a number of other methods for detecting and quantitating the expression of gene transcripts are known in the art and may be employed, including without limitation, RT-PCR techniques such as TaqMan®, RNAse protection, branched chain, etc.
  • Databases comprising quantitative gene expression information preferably include qualitative and quantitative and/or semi-quantitative information respecting the observed toxicological responses and other conventional toxicology endpoints, such as for example, body and organ weights, serum chemistry and histopathology observations, histopathology scores and/or similar parameters.
  • the database preferably includes histopathology scores for each animal which has been exposed to one or more agent(s). These scores can be assigned based on actual histopathology observations for the tissue and animal or on the basis of effects observed for other animals treated with the same agent and dose level.
  • the scores are numerical scores that reflect the occurrence and severity of histopathological changes. These scores can be adjusted to have similar range to gene expression changes. For example, a score of 1 could be assigned to samples with no changes and scores of 2-8 assigned to increasingly severe changes. Because the scores are numerical, they are suitable for use with a variety of statistical correlation and similarity measures.
  • histopathology scores may be utilized to identify genes which correlate with the observed toxicological response, using any number of statistical correlation and similarity analysis techniques, including without limitation those correlation or similarity measures described or employed in Example 1 (e.g., Pearson, Spearman, change, smooth, distance etc.). Such correlating genes may be used as predictive gene candidates.
  • the correlating gene lists as well as the entire array gene list are used as input gene lists in the GeneSpringTM (Version 4.1 , Silicon Genetics, Redwood City, CA) Predict Parameter Values tool (otherwise known hereafter as "Predictive Model”).
  • the correlating gene lists as well as the entire array gene list are used as input gene lists in the Predictive Model as implemented in S- PLUS® statistical software (Insightful Corporation, Seattle, WA).
  • Class Prediction and Classification Statistical analysis of the database of gene expression profiles can be affected by utilizing commercially available software programs.
  • One embodiment of the present invention provides for the utilization of S-PLUS® software.
  • Other software programs which can be used for statistical analysis are GeneSpringTM and SAS software packages (SAS Institute Inc., Gary, NC).
  • S-PLUS® software class predictions can be made from the genes in the database, as detailed in Example 1 , using one or more training and test sets. In one embodiment, five training sets and five test sets are obtained, as shown in Example 1 (Table 2). Spleen necrosis classifications are entered for the samples in each training and test set. Toxicological classifications can be defined by the presence or the absence of various pathologies.
  • toxicity observed as spleen necrosis is defined as 2 classifications (i.e. spleen necrosis vs. no spleen necrosis)) observed 72 hours after treatment with an agent.
  • toxicity can manifest in other spleen pathologies such as congestion or hyperplasia.
  • more complex (three or more) classifications can be used in defining multiple pathologies.
  • predicted classifications of the test set samples are obtained by using /.-nearest neighbor (or knn) voting procedure.
  • the class in which each of the knn is determined and the test sample is assigned to the class with the largest representation after adjusting for the proportion of classifications in the training set. In one embodiment, adjustments are made to account for different proportions of classes in the training set.
  • Toxicity can also be observed at various early time points after exposure to an agent up to and including 72 hours after treatment.
  • a skilled toxicologist can determine the optimal time after exposure to an agent to observe pathology by either what has been disclosed in the art or a stepwise experimentation with time increments, for example 2, 4, 6, 12, 18, 24, 36, 48 hours post-exposure or even longer time increments, for example, days, weeks, or months after exposure to the agent.
  • Figure 1 illustrates the overall process used to identify spleen necrosis predictive genes. In one embodiment, this process runs independently for each time point.
  • the number of input genes that are to be used in the Predictive /Model can be varied, for example 50, 40, 30, 20, 10, 5, 2, or 1 gene(s) can be used.
  • One embodiment of the present invention provides that at least 50 genes are used. In another embodiment, all genes in the input list are used.
  • Optimal gene lists (i.e., the list which results in the best predictive accuracy with the lowest number of genes used) are generated.
  • One embodiment of the present invention provides that optimum gene lists for all input gene lists are combined for each training and test set and then these combined lists for all five training and test sets are merged to create an aggregate list of predictive genes.
  • the aggregate list can then be subdivided to smaller lists of genes based on the number of times that the genes occurred on the predictive gene lists for an individual training or test set.
  • the resulting gene lists are designated herein as Combo 5, 4, 3, 2, or 1 lists.
  • the genes that were predictive in all 5 training and test sets are designated as Combo 5 and the genes that were predictive in 4 of 5 training and test sets are designated as Combo 4 and so forth.
  • Table 17 lists gene names, accession numbers and sequence information for the spleen necrosis predictive genes found by analysis of the database in the manner described above. The present invention demonstrates that each of these genes contributes to predictive performance for at least one input gene list and training/test set and one time point.
  • Table 16 lists homologous genes for the RCT sequences that were identified by BLAST search using the GenBank NR database as the target database. Homologies are given from BLAST searches using the Phase 1 RCT sequence as the query sequence and GenBank NR database as the target sequence database. The best BLAST homology sequence observed is given. In general, no significant homology indicates that no BLAST match was observed with a BIT score >100.
  • Predictive performance may also be assessed using data from different time points after exposure to the agent.
  • 24 hour expression data are used.
  • 6 hour expression data are used, as described in Examples 3 and 4.
  • 72 hour expression data are used, as described in Example 5 and 6.
  • Table 7 the predictive accuracy using 24 hour expression data and the largest predictive gene list is 81 %.
  • Predictive performance were assessed using subsets of genes from the different Combo lists. As indicated in Example 2, most randomly selected subsets of the Combo gene lists yielded predictive accuracies of 70% or greater and even individual genes had geometric means that were often greater than 60%. In one embodiment, using 10 genes from Combos 5, 4, and 3 yields an accuracy in the range of 81-84%. Using different Combo lists may require a greater number of genes to reach the same accuracy level.
  • spleen necrosis predictive genes disclosed herein and spleen necrosis predictive genes identified by using methods disclosed herein are useful for predicting spleen necrosis in response to exposure to one or more agents.
  • larger numbers of predictive genes provides redundancy which may improve accuracy and precision.
  • Applications using larger numbers of predictive genes may include, for example, tests of drug candidates at later stages of commercial development.
  • larger numbers of predictive genes may be desirable at later stages of preclinical development of a therapeutic candidate, where in vivo samples can be obtained and more comprehensive methods such as microarray measurement of gene expression are appropriate.
  • the larger gene sets can also include different subsets of genes which may offer more insight into potential mechanisms of toxicity, providing the potential to predict long term toxic consequences such as chronic, irreversible toxicity or carcinogenicity.
  • genes within the spleen necrosis predictive gene sets provided herein are also suitable for prediction of toxicity in other organs or may be preferable for predicting toxicity for wider ranges of timepoints or treatment routes or regimens. As an example of the latter, some of the predictive genes are observed at three different timepoints after treatment. These genes are useful for prediction in cases where the samples come from treatment protocols that have different measurement timepoints or routes of administration than those employed for the database used in the discovery of the predictive genes disclosed herein or where the toxicokinetics for a particular agent are known or suspected to be different from those in the database.
  • the agent is an agent for which no expression profile has been assessed or stored in the database or library.
  • Animal, e.g., rat, are dosed with such an agent and the gene expression profile(s) are the test set for the Predictive Model.
  • the training set which is used in the Predictive Model in this case can be the entire database of sample array data because the test set data is not present in the database. The prediction can be made with accuracy without the use of histopathology scores as part of the input into the Predictive Model.
  • the agent is an agent present in the database but is used at a different dose level or with a different treatment protocol than used in the database.
  • the training set which is used in the Predictive Model in this case can be the entire database of sample array data because the test set data is not present in the database. Again, the prediction can be made with accuracy without the use of histopathology scores as part of the input into the Predictive Model.
  • the exposure time of the agent is other than 6, 24, or 72 hours, or repeat dosing protocols are used.
  • the skilled artisan can use the predictive toxicity genes from surrounding time points to extrapolate the predicted toxicity without undue experimentation. For example, if the individual has been exposed to the agent for 12 hours, then predictive genes from 6 and 24 hours timepoints are used as guidelines for extrapolating toxicity predictions.
  • the spleen necrosis predictive genes are used to detect toxic effects that are manifested as long lasting or chronic consequences such as irreversible toxicity or carcinogenesis.
  • the predictive genes and model can be applied to databases where classifications of training and test set samples are made with respect to actual or putative endpoints such as irreversible toxicity or carcinogenicity.
  • the predictive genes are used in a variety of alternative models to predict spleen necrosis. Some of these models do not require the direct use of data in a database but use functions or coefficients derived from the database.
  • the predictive genes and models are used to evaluate in vitro systems for their ability to reflect in vivo toxic events and to use such in vitro systems for predicting in vivo toxicity. Expression profiles for predictive genes are created from candidate in vitro assays using treatments with agents of known in vivo toxicity and for which in vivo data on gene expression are available. The expression data and predictive models of this invention are used to determine whether the in vitro assay system has predictive gene expression responses that accurately reflect the in vivo situation.
  • the present invention provides for a superior and novel tool for evaluating and optimizing in vitro systems for their ability to reflect and accurately predict in vivo responses.
  • Another embodiment of the present invention provides that the predictive genes and models are used with an in vitro system to accurately predict in vivo toxicity.
  • In vitro systems that have been evaluated and optimized as described above are treated with test agents and expression profiles are measured for predictive genes.
  • the expression profiles are used in conjunction with a predictive model to predict in vivo toxicity.
  • the application of this embodiment to in vitro human systems can provide a unique capability to accurately predict human toxic responses without human in vivo exposure or treatment.
  • Another embodiment of the present invention provides that measurement of the expression levels of the proteins encoded by the predictive genes can be used in conjunction with predictive models to predict toxicity.
  • spleen necrosis predictive genes are various genes known to encode cell surface, secreted and/or shed proteins. This enables the development of methods for predicting toxicity using protein biomarkers.
  • Table 18 there are 28 genes in the master predictive set which are known to encode secreted proteins.
  • the protein products are easier to access since they are secreted into body fluids and are thus more amenable to be quantified.
  • another embodiment of the present invention provides a spleen necrosis predictive assays which detects the expression of one or more of the predictive proteins. Such assays have several advantages, such as:
  • the identified predictive genes are potential therapeutic targets when the genes are involved in toxic damage or repair responses whose expression or functional modification may attenuate, ameliorate or eliminate disease conditions or adverse symptoms of disease conditions.
  • the predictive genes are organized into clusters of genes that exhibit similar patterns of expression by a variety of statistical procedures commonly used to identify such coordinate expression patterns. Common functional properties of these clustered genes are used to provide insight into the functional relationship of the response of these genes to toxic effects. Common genetic properties of these genes (e.g., common regulatory sequences) provide insight into functional aspects by revealing known or novel similarities in the coding region of the genes. The presence of common known or novel signal transduction systems that regulate expression of the genes can also provide functional insight. The presence of common known or novel regulatory sequences in the identified predictive genes are also used to identify additional spleen necrosis predictive genes.
  • the predictive genes are used to predict toxicity responses in other lymphoid tissue types (i.e. bone marrow, thymus, and lymph nodes).
  • the spleen necrosis predictive genes are used to predict toxicity responses in other species, for example, human, non-human primate, mouse, hamster, guinea pig, hamster, rabbit, cattle, sheep, pig, chicken, and dog. Some members of the spleen necrosis predictive genes may be more suitable for prediction of toxicity in species other than the species used to derive the database (rat in the case of the examples provided).
  • One method for identifying such genes involves examining DNA sequence databases to identify and characterize orthologous sequences to the predictive genes in the target species.
  • One of skill in the art can examine the orthologous sequences for similarity in amino acid coding regions and motifs as well as for similarities in regulatory regions and motifs of the gene.
  • spleen necrosis predictive genes or gene sequences are used for screening other potential toxicity predictive genes or gene sequences in other species or even within the same species using methods known in the art. See, for example, Sambrook supra. Gene sequences which hybridize under stringent conditions to the spleen necrosis predictive gene sequences disclosed herein are selected as potential toxicity predictive genes. Additionally, genes which demonstrate significant homology with the spleen necrosis predictive genes disclosed herein (preferably at least about 70%) are selected as toxicity predictive gene candidates. It is understood that conservative substitutions of amino acids are possible for gene sequences which have some percentage homology with the spleen necrosis predictive gene sequences of this invention.
  • a conservative substitution in a protein is a substitution of one amino acid with an amino acid with similar size and charge.
  • Groups of amino acids known normally to be equivalent are: (a) Ala, Ser, Thr, Pro, and Gly; (b) Asn, Asp, Glu, and Gin; (c) His, Arg, and Lys; (d) Met, Glu, lie, and Val; and (e) Phe, Tyr, and Trp.
  • the predictive spleen necrosis genes can be used as guides to predicting toxicity for agents that have been administered via different routes (intraperitoneal, intravenous, oral, dermal, inhalation, mucosal, etc.) from the routes that were used to generate the database or to identify the spleen necrosis predictive genes.
  • the present invention is not intended to be limiting to agents that have been administered at different dosages than the agents that were used to generate the database or to identify the predictive spleen necrosis genes.
  • organs/tissues were frozen within 3 minutes of the death of the animal to ensure that mRNA did not degrade.
  • the organs/tissues were then packaged into well-labeled plastic freezer quality bags and stored at -80 degrees until needed for isolation of the mRNA from a portion of the organ/tissue sample.
  • tissue sample was placed on a double layer of aluminum foil which was then placed within a weigh boat containing a small amount of liquid nitrogen. The aluminum foil was folded around the tissue and then struck by a small foil-wrapped hammer to administer mechanical stress forces.
  • spleen tissue was weighed out and placed in a sterile container. To preserve integrity of the RNA, tissues were kept on dry ice when other samples were being weighed. A RLT (Qiagen®) buffer was added to the sample to aid in the homogenization process. The tissue was homogenized using commercially available homogenizer ( IKA Ultra Turrax T25 homogenizer) with the 7 mm microfine sawtooth shaft and generator (195 mm long with a processing range of 0.25 ml to 20 ml, item # 372718). After homogenization, samples were stored on ice until samples were homogenized. The homogenized tissue sample was spun to remove nuclei thus reducing DNA contamination.
  • IKA Ultra Turrax T25 homogenizer IKA Ultra Turrax T25 homogenizer
  • Rat 700 CT chip Gene expression data was generated from a microarray chip that has a set of toxicologically relevant rat genes which are used to predict toxicological responses.
  • the rat 700 CT gene array is disclosed in pending U.S. applications 60/264,933; 60/308,161 ; and pending application filed on January 29, 2002 (serial number 10/060,893).
  • Microarray RT reaction Fluorescence-labeled first strand cDNA probe was Made from the total RNA or mRNA isolated from spleens of control and treated rats. This probe was hybridized to microarray slides spotted with DNA specific for toxicologically relevant genes. The materials needed are: total or messenger RNA, primer, Superscript II buffer, dithiothreitol (DTT), nucleotide mix, Cy3 or Cy5 dye, Superscript II (RT), ammonium acetate, 70% EtOH, PCR machine, and ice. The volume of each sample that would contain 20 ⁇ g of total RNA (or 2 ⁇ g of mRNA) was calculated.
  • the amount of DEPC water needed to bring the total volume of each RNA sample to 14 ⁇ l was also calculated. If RNA was too dilute, the samples were concentrated to a volume of less than 14 ⁇ l in a speedvac without heat. The speedvac must be capable of generating a vacuum of 0 Milli- Torr so that samples can freeze dry under these conditions. Sufficient volume of DEPC water was added to bring the total volume of each RNA sample to 14 ⁇ l. Each PCR tube was labeled with the name of the sample or control reaction. The appropriate volume of DEPC water and 8 ⁇ l of anchored oligo dT mix (stored at - 20°C) was added to each tube.Then the appropriate volume of each RNA sample was added to the labeled PCR tube.
  • the samples were mixed by pipeting.
  • the tubes were kept on ice until samples were ready for the next step. It is preferable for the tubes to be kept on ice until the next step is ready to proceed.
  • the samples were incubated in a PCR machine for 10 minutes at 70°C followed by 4°C incubation period until the sample tubes were ready to be retrieved.
  • the sample tubes were left at 4°C for at least 2 minutes.
  • the Cy dyes are light sensitive, so any solutions or samples containing Cy-dyes should be kept out of light as much as possible (e.g., cover with foil) after this point in the process. Sufficient amounts of Cy3 and Cy5 reverse transcription mix were prepared for one to two more reactions than would actually be run by scaling up the following:
  • the RT reaction contained impurities that must be removed. These impurities included excess primers, nucleotides, and dyes. The primary method of removing the impurities was by following the instructions in the QIAquick PCR purification kit (Qiagen cat#120016).
  • the RT reactions were cleaned of impurities by ethanol precipitation and resin bead binding.
  • the samples from DNA engine were transferred to Eppendorf tubes containing 600 ⁇ l of ethanol precipitation mixture and placed in -80°C freezer for at least 20-30 minutes. These samples were centrifuged for 15 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C) and carefully the supernatant was decanted. A visible pellet was seen (pink/red for Cy3, blue for Cy5). Ice cold 70% EtOH (about 1 ml per tube) was used to wash the tubes and the tubes were subsequently inverted to clean tube and pellet.
  • the tubes were centrifuged for 10 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C), then the supernatant was carefully decanted. The tubes were air dried for about 5 to 10 minutes, protected from light. When the pellets were dried, they were resuspended in 80 ⁇ l nanopure water. The cDNA/mRNA hybrid was denatured by heating for 5 minutes at 95°C in a heat block and flash spun. Then the lid of a "Millipore MAHV N45" 96 well plate was labeled with the appropriate sample numbers. A blue gasket and waste plate (v-bottom 96 well) was attached.
  • the filter plate was placed on a clean collection plate (v- bottom 96 well) and 80 ⁇ l of Nanopure water, pH 8.0-8.5 was added. The pH was adjusted with NaOH. The filter plate was secured to the collection plate and after 5 minutes was centrifuged for 7 minutes at 2500 rpm.
  • Cy -Dye Labeled Cdna To purify fluorescence-labeled first strand cDNA probes, the following materials were used: Millipore MAHV N45 96 well plate, v-bottom 96 well plate (Costar), Wizard DNA binding Resin, wide orifice pipette tips for 200 to 300 ⁇ l volumes, isopropanol, nanopure water. It is preferable to keep the plates aligned during centrifugation. Misaligned plates lead to sample cross contamination and/or sample loss. It is also important that plate carriers are seated properly in the centrifuge rotor.
  • Probes were added to the appropriate wells (80 ⁇ l cDNA samples) containing the Binding Resin.
  • the reaction is mixed by pipeting up and down -10 times. It is preferable to use regular, unfiltered pipette tips for this step.
  • the plates were centrifuged at 2500 rpm for 5 minutes (Beckman GS-6 or equivalent) and then the filtrate was decanted. About 200 ⁇ l of 80% isopropanol was added, the plates were spun for 5 minutes at 2500 rpm, and the filtrate was discarded. Then the 80% isopropanol wash and spin step was repeated.
  • the filter plate was placed on a clean collection plate (v-bottom 96 well) and 80 ⁇ l of Nanopure water, pH 8.0-8.5 was added.
  • the pH was adjusted with NaOH.
  • the filter plate was secured to the collection plate with tape to ensure that the plate did not slide during the final spin.
  • the plate sat for 5 minutes and was centrifuged for 7 minutes at 2500 rpm. Replicates of samples should be pooled.
  • Microarray Hybridization To hybridize labeled cDNA probes to single stranded, covalently bound DNA target genes on glass slide microarrays, the following material were used: formamide, SSC, SDS, 2 ⁇ m syringe filter, salmon sperm DNA (Sigma, cat # D-7656), human Cot-1 DNA (Life Technologies, cat # 15279-011), poly A (40 mer: Life Technologies, custom synthesized), yeast tRNA (Life Technologies, cat # 15401-04), hybridization chambers, incubator, coverslips, parafilm, heat blocks. In one embodiment, the array is covered to ensure proper hybridization.
  • Hybridization buffer 50% Formamide, 50 ⁇ l formamide; 5X SSC, 25 ⁇ l 20X SSC; and 0.1% SDS, 25 ⁇ l 0.4% SDS; The solution was filtered through 0.2 ⁇ m syringe filter, then the volume was measured. About 1 ⁇ l of salmon sperm DNA (10mg/ml) was added per 100 ⁇ l of buffer.
  • the hybridization buffer was made up as: 50% Formamide, 50 ⁇ l formamide; 10X SSC, 50 ⁇ l 20X SSC; 0.2% SDS, 1 ⁇ l 20% SDS. [122] The solution was filtered through 0.2 ⁇ m syringe filter, then the volume was measured. One microliter of salmon sperm DNA (9.7mg/ml), 0.5 ⁇ l Human Cot-1 DNA (5 ⁇ g/ ⁇ l), 0.5 ⁇ l poly A (5 ⁇ g/ ⁇ l), 0.25 ⁇ l Yeast tRNA (10 ⁇ g/ ⁇ l) was added per 100 ⁇ l of buffer. The hybridization buffers were compared in validation studies and there was no change in differential gene expression data between the two buffers.
  • Non-specifically bound cDNA probe should be removed from the array. Removal of nonspecifically bound cDNA probe was accomplished by washing the array and using the following materials: slide holder, glass washing dish, SSC, SDS, and nanopure water. Six glass buffer chambers and glass slide holders were set up with 2X SSC buffer heated to 30-34°C and used to fill up glass dish to 3/4th of volume or enough to submerge the microarrays. The slides were placed in 2X SSC buffer for 2 to 4 minutes while the cover slips fall off.
  • the slides were then moved to 2X SSC, 0.1% SDS and soaked for 5 minutes.
  • the slides were transferred into 0.1X SSC and 0.1% SDS for 5 minutes.
  • the slides are transferred to 0.1 X SSC for 5 minutes.
  • the slides, still in the slide carrier were transferred into nanopure water (18 megaohms) for 1 second.
  • the stainless steel slide carriers were placed on micro-carrier plates and spun in a centrifuge (Beckman GS-6 or equivalent) for 5 minutes at 1000 rpm.
  • GeneSpringTM software (Version 4.1 , Silicon Genetics) was used for array normalization and transformation, and for statistical analyses that identified genes whose expression correlated with histopathology scores.
  • Microarray data were loaded into GeneSpringTM software for correlation analysis as GenePix files as above.
  • Specific data loaded into GeneSpringTM software included gene name, GenBank ID control channel mean fluorescence and signal channel mean fluorescence.
  • Expression ratio data (ratio of signal to control fluorescence) were normalized using the 50 th percentile of the distribution of genes and control channel. Ratio data were excluded from analysis if the control channel value was ⁇ 0. For analysis of correlations gene expression ratios were transformed as the log of the ratio.
  • GeneSpringTM software was used for spleen necrosis class prediction.
  • the Predict Parameter Values tool relies on standard statistical procedures that can be implemented in a variety of statistical software packages.
  • the Predictive Model is implemented in S-PLUS® software.
  • other statistical software programs may be used. The following is a summary of the procedure as described in GeneSpring's Advanced Analysis Techniques Manual (Release Date March 13, 2001 , Silicon Genetics) with additional information supplied by Silicon Genetics and a statistical expert.
  • the first step is variable selection of genes to be used for prediction. This entails taking a single gene and a single class (e.g, spleen necrosis) and creating a contingency table.
  • columns 1 through N of the table each represent one possible cutoff point based on the gene expression level (ratio of signal/control) for that class.
  • the number of possible cutoffs is less than or equal to the total number of samples for the class (e.g., A). It is possibly less than the total number, since there may be ties in gene expression level.
  • N, M and may or may not be distinct.
  • n-class problem is illustrated, where x and y entries are the class counts at that gene expression cutoff level, for that specific gene and class, either above (“a") or below (“b") the cutoff.
  • Classl is the set of all samples (above or below) the cutoff for Classl
  • ICIassl are all those not in Classl (above or below) the cutoff, and similarly for the other classes.
  • the class totals in the training set are the total class marginals used to compute Fisher's exact test.
  • the genes per class are rank ordered by the most discriminating (highest) score.
  • the predictivity list is composed of the most discriminating genes per class. Namely, genes are combined that best discriminate class 1 with those that best discriminate class 2 and so on. The genes are selected in rotation of the highest score per class. Duplicate genes are ignored in the rotation and not added to the list, the gene with the next highest score is taken.
  • each sample is a vector of 60 normalized expression ratios. Since the selection of genes is done in rotation, for 2 classes, the list contains 30 genes for class one, and 30 genes for class two. For 3 classes the list contains 20 genes for class one, 20 for class two, and 20 for class three, etc.
  • the matrix below illustrates the basic features of this gene selection process.
  • the test set is classified based on the /c-nearest neighbor (knn) voting procedure. Using just those genes in the gene list, for each sample in the test set of samples, the k- nearest neighbors in the training set are found with the Euclidean distance. The class in which each of the /c-nearest neighbors is determined, and the test set sample is assigned to the class with the largest representation in the k nearest neighbors after adjusting for the proportion of classes in the training set.
  • knn /c-nearest neighbor
  • the decision threshold is a mechanism to help clearly define the class into which the sample will fall, and can be set to reject classification if the voting is very close or tied. (Thus, k can be even for two-class problems without worrying about the tie problem.)
  • a p-value is calculated for the proportion of neighbors in each class against the proportions found in the training set, again using Fisher's exact test, but now a one-sided test.
  • a p-value ratio is set as a way of setting the level of confidence in individual sample predictions based on the ratio of p-values for the best class (lowest p-value) versus the second best class (second lowest p-value). For example, if the P-value is set at 0.5 and the ratio of p-values for a particular sample is 0.6, then the predictive model will not make a call for that sample.
  • Spleen necrosis classifications were entered for the exported data set as a parameter column. Toxicity, as defined by observation of spleen necrosis at 72 hours after treatment, was entered as spleen necrosis (yes) or no spleen necrosis (no) for each animal in a compound-dose group. Additionally, random histopathology classification was designated for the training and test sets. This was done by randomly assigning the same number of spleen necrosis calls to the individual animals.
  • An S-PLUS® batch program was used with each of the training and test sets to generate predictions of histopathology classifications of the test sets.
  • the program performs functions similar to those used in the Predictive Model as embodied in GeneSpring's Predict Parameter Values tool, such as Fisher discriminant analysis, /c-nearest neighbor voting, and P-value ratio calculation.
  • a P-value ratio cutoff of 0.5 was used. Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff.
  • the number of genes used to predict was varied starting with one gene and increasing incrementally until all genes in the input gene list were used. For each number of genes the geometric mean for the combination was displayed in the batch program's output file. For each input list and optimal number of predictive genes (lowest number of genes giving a maximum overall percent of correct calls) additional information was recorded that included the list of specific genes in the optimum predictive set.
  • the correlating gene lists as well as the entire array gene list were provided as input lists to the S-PLUS® batch program. These lists as well as the entire array gene list were used for each of the five training and test sets defined in Materials and Methods to generate predictions of histopathology classifications of the test sets.
  • Input genes for the S-PLUS® batch program included all 700 genes in the GenePix file (the rat CT Array) which were disclosed in a currently pending application (serial number 10/060,893) filed on January 29, 2002, as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously. To obtain an optimum number of predictive genes, the number of genes used to predict are varied incrementally . starting with 1 gene.
  • each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set.
  • the aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 5 training and test sets were used, genes that were predictive in all 5 training and test sets were designated as Combo (combination) 5. Genes that were predictive in only 4 of 5 training and test sets were designated as Combo 4, etc.
  • a list of predictive genes organized by their occurrence in the separate training and test sets is presented in Table 3.
  • Array Data, Normalization and Transformation Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 20 presents 24 hour gene expression data for the predictive genes. These data can be used with a Predictive Model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example.
  • Training and Test Data Sets The training and test data sets used are Those described in Table 2 of Example 1.
  • Spleen necrosis classifications used are described in Table 1 of Example 1. In this analysis randomized classifications (same number of positive and negative spleen necrosis classifications distributed randomly among the samples) were also used.
  • Prediction Measures Measures of prediction used for these analyses are generally accepted prediction measures for information about actual and predicted classifications done by a classification system (Modern Applied Statistics with S- Plus, W. N. and B. D. Ripley, Springer, 1994, 3rd edition.; Proc. 14th International Conference on Machine Learning, Miroslav Kubat, Stan Matwin, 1997). Results from predictions of a 2-class case can be described as a 2-class matrix:
  • Class I is defined as "Spleen Necrosis.”
  • Class II is defined as "No Spleen Necrosis"
  • Random Selected Gene Sets Subsets of randomly selected genes were prepared from the predictive gene sets to test whether such subsets would have predictive value. Assignments of genes to these subsets are presented in Tables 4-5. Genes were also randomly selected from the list of all genes excluding the 154 twenty-four hour predictive genes (also known as non-predictive genes) by assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes. Assignments of genes to these subsets are presented in Table 6. Genes were randomly selected from the entire array list of genes excluding the Combo All 154 predictive genes by assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
  • Prediction results for 24 hour expression data using genes identified as predictive are presented in Table 7.
  • Prediction measures are given as means and range of values (in parentheses) for five training/test sets using 24 hour array data and gene lists as presented in Table 5.
  • Unit of prediction was the animal and the predictive classification was for spleen necrosis observed at 72 hours after treatment. ** Standard prediction measures were used as defined in Materials and Methods.
  • the Geometric Mean was used as an indication of predictive performance that includes consideration of the proportion of positive and negative cases for spleen necrosis. All gene sets gave GMM measures > 0.50 (50%), and the Combo 4, Combo 3, and Combo 2 gene sets had GMM measures of approximately 0.70 or greater (69.8% to 72.2%). The GMM measures indicate that the 24 hour gene sets can accurately predict samples with spleen necrosis.
  • the table also shows that while many of the individual genes of the Combo groups were predictive (e.g., geometric means as high as 78.5% for individual genes of Combo 4 and 75.8% for Combo 3, with a majority of genes exceeding 60% for the three Combo groups), the geometric mean of individual genes rarely exceeded the geometric mean of the whole combination.
  • Example 1 Spleen Necrosis Predictive Genes from 6 hour expression data are identified. Compounds and treatments list used to construct the spleen necrosis database are given in Table 1 of Example 1. This table also provides the evaluation of spleen toxicity as observed as necrosis in samples collected 72 hours after treatment. The database is described in detail in Example 1. This Example analyzes expression data from samples collected 6 hours after treatment.
  • Spleen necrosis classifications were entered for the exported data set as a parameter column. Toxicity, as defined by observation of spleen necrosis at 72 hours after treatment, was entered as spleen necrosis (yes) or no spleen necrosis (no) for each animal in a compound-dose group. Additionally, random histopathology classification was designated for the training and test sets. This was done by randomly assigning the same number of spleen necrosis calls to the individual animals.
  • the correlating gene lists as well as the entire array gene list were provided as input lists to the S-PLUS® batch program. These lists as well as the entire array gene list were used for each of the five training and test sets defined in Materials and Methods to generate predictions of histopathology classifications of the test sets.
  • Input genes for the S-PLUS® batch program included all 700 genes in the GenePix file (the rat CT Array) which were disclosed in a currently pending application (serial number 10/060,893) filed on January 29, 2002, as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously.
  • the number of genes used to predict are varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used.
  • the specified number of predictive genes was varied to obtain an optimum number of predictive genes.
  • each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set.
  • the aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 5 training and test sets were used, genes that were predictive in all 5 training and test sets were designated as Combo (combination) 5. Genes that were predictive in only 4 of 5 training and test sets were designated as Combo 4, etc.
  • Array Data, Normalization and Transformation Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 19 lists 6 hour gene expression data for the predictive genes. These data can be used with a Predictive Model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example.
  • Spleen necrosis classifications used are described in Table 1 of Example 1. In this analysis randomized classifications (same number of spleen necrosis and no spleen necrosis classifications distributed randomly among the samples) were also used.
  • Spleen necrosis classifications were entered for the exported data set as a parameter column. Toxicity, as defined by observation of spleen necrosis at 72 hours after treatment, was entered as spleen necrosis (yes) or no spleen necrosis (no) for each animal in a compound-dose group. Additionally, random histopathology classification was designated for the training and test sets. This was done by randomly assigning the same number of spleen necrosis calls to the individual animals.
  • a S-PLUS® batch program was used with each of the training and test sets to generate predictions of histopathology classifications of the test sets.
  • the program performs functions similar to those used in GeneSpring's Predict Parameter Values tool such as Fisher discriminant analysis, /c-nearest neighbor, and P-value ratio calculation.
  • a P-value ratio cutoff of 0.5 was used. Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff.
  • the number of genes used to predict was varied starting with one gene and increasing incrementally until all genes in the input gene list were used. For each number of genes the geometric mean for the combination was displayed in the batch program's output file. For each input list and optimal number of predictive genes (lowest number of genes giving a maximum overall percent of correct calls) additional information was recorded that included the list of specific genes in the optimum predictive set.
  • Results Expression array data were first examined for the existence of genes whose expression correlated with histopathology scores.
  • Table 1 in Materials and Methods of Example 1 presents a list of the compounds and dose levels along with the spleen necrosis histopathology classification and histopathology severity scores used for this analysis. For each distance measure the probability was adjusted in increments of 0.05 until at least 50 correlating genes were obtained. Lists of correlating genes were obtained using the distance measures described in Materials and Methods.
  • the correlating gene lists as well as the entire array gene list were provided as input lists to the S-PLUS® batch program (described in Materials and Methods) that employs a /c-nearest neighbor (knn) predictive model. These lists as well as the entire array gene list were used for each of the five training and test sets defined in Materials and Methods to generate predictions of histopathology classifications of the test sets.
  • Input genes for the S-PLUS® batch program included all 700 genes in the GenePix file (the Rat CT Array) as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously. The number of predictive genes in the gene list was varied to obtain an optimum number of predictive genes.
  • each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set.
  • the aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 5 training and test sets were used, genes that were predictive in all 5 training and test sets were designated as Combo (combination) 5. Genes that were predictive in only 4 of 5 training and test sets were designated as Combo 4, etc.
  • Array Data, Normalization and Transformation Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 21 presents 72 hour gene expression data for the predictive genes. These data can be used with a Prediction Model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example.
  • Training and Test Data Sets The training and test data sets used are those described in the table of Example 5.
  • Spleen necrosis classifications used are described in Table 1 of Example 1. In this analysis randomized classifications (same number of spleen necrosis classifications distributed randomly among the samples) were also used.
  • Prediction Measures Prediction measures such as accuracy and geometric mean were calculated as Described in Example 2.
  • Predictive Modeling The predictive task with the spleen necrosis gene expression data is a 2-class classification problem, where the 2 classes of possible responses are defined as spleen necrosis or no spleen necrosis. This is an uneven class problem in that the class of negative responses is roughly 75 percent of the data or more in the database tested.
  • a discrimination function can be used to classify a training set. This function can be cross-validated with a testing set, often repeatedly to quantify the mean and variation of the classification error. There are numerous common discrimination functions, and a comparative study of the performance of these functions is useful in determining the best classifier. Additional measures can then be used to compare the performance of the classifiers. Since the classes are of significantly uneven sizes, use a geometric mean measure (GMM) can be used to compare models, namely, the square root of the product of the true positives and the true negatives.
  • GMM geometric mean measure
  • knn is also database dependent in that a database containing training set is needed to perform nearest neighbor search and classification.
  • Classifier Models A variety of common classification techniques are available. A simple hybrid classifier could be designed and tested, using the knn results, to transform the knn model into a database independent model. This model is termed a centroid model. The centroid model uses the correctly identified test data results from knn and locates a centroid of the subset of k samples that are of the same class for each correctly identified test sample. The centroid is assigned the correct class, and with new test data, a sample is assigned the class of its nearest centroid.
  • the neural network is a simple, feed-forward network, allowing skip layers, and with an entropy fitting criterion.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides a system and method for identifying spleen necrosis predictive genes which can be used to predict spleen necrosis in response to one or more agents. A composition comprising a plurality of cDNAs for use in detecting the altered expression of genes in a toxic response of the spleen, wherein said plurality of cDNAs comprises sequences disclosed in Table 17 or the complete complements thereof.

Description

SPLEEN NECROSIS PREDICTIVE GENES
CROSS REFERENCE TO OTHER APPLICATIONS
[01] This application claims benefit under 35 U.S.C. § 119(e) of provisional application no. 60/455,443, filed March 17, 2003, which is incorporated by reference herein, in its entirety, for all purposes.
REFERENCE TO SEQUENCE DATA ON CD
[02] Description of Accompanying CD-ROM (37 C.F.R. §§ 1.52 & 1.58): Tables 17, 19, 20, and 21 referred to herein are filed herewith on CD-ROM in accordance with 37 C.F.R. §§ 1.52 and 1.58. Two identical copies (marked "Copy 1" and "Copy 2") of said CD-ROM, both of which contain Tables 17, 19, 20, and 21 , are submitted herewith, for a total of two CD-ROM discs submitted. Table 17 is recorded on said CD-ROM discs as "Table17.txt" created March 11 , 2003 size 213,626 bytes. Table 19 is recorded on said CD-ROM discs as "Table19.txt" created on March 11 , 2003, size 654,120 bytes. Table 20 is recorded on said CD- ROM discs as "Table20.txt" created on March 11 , 2003 size 603,657 bytes. Table 21 is recorded on said CD-ROM discs as "Table21.txt" created on March 11 , 2003 size 653,883 bytes. Sequence listing as requested by 37 CFR §1.821 (c) is recorded on said CD-ROM disc as "2874-021 P.genesequence.txt created on March 15, 2004.
[03] The contents of the files contained on the CD-ROM discs submitted with this application are hereby incorporated by reference into the specification.
BACKGROUND
[04] This invention is in the field of toxicology. More specifically, it relates to spleen necrosis predictive genes and the methods of using such genes to predict spleen necrosis.
[05] Molecular biology and genomics technologies have potential to create dramatic advances and improvements for the science of toxicology as for other biological sciences. See, for example, MacGregor, et al. Fund. Appl. Tox. 26:156-173, 1995; Rodi et al., Tox. Pathology 27 :107 -110, 1999; Cunningham et al., Ann. N.Y. Acad. Sci. 919: 52-67, 2000; Pritchard et al., Proc. Natl. Acad. Sci. USA 98:13266-13271 , 2001; and Fielden and Zacharewski, Tox. Sciences 60: 6- 10, 2001. The advantage of these technologies is that they can provide massive amounts of parallel information and that this information concerns processes and events occurring at the molecular level. This level of information is in dramatic contrast to conventional safety assessment toxicology that, to a large extent, currently relies on subjective evaluation (e.g., in-life observations of behavior, observations of gross abnormalities at necropsy and histopathological examination of stained tissue slides using a microscope). These current methodologies are largely subjective and in some cases such as histopathological evaluation, they require someone with a high degree of training, experience and skill to make competent evaluations. Furthermore, many of the methodologies require access to organs and tissues that necessitates either killing laboratory animals or surgery to obtain tissue specimens.
[06] Recently, there have been some initial efforts to apply molecular biology and genomics technologies to toxicology. Some efforts have involved application of gene expression measurements. See for example U.S. 20030044790, serial number 876249, titled method of diagnosing of exposure to toxic agents by measuring distinct pattern in the levels of expression of specific genes and published 03/06/03 which discloses a method of diagnosing exposure to a toxic agent comprising the steps of detecting the amount of protein/gene expression present in a sample of mammalian tissue or mammalian body fluids that has not been exposed to a toxic agent, comparing the sample to the amount of protein/gene expression in an exposed sample, comparing the difference in expression to a library of expected protein/gene expression for predetermined toxic agents is made and evaluating whether the difference indicates the exposure to a particular toxic agent. See also U.S. Patent 6,228,589 and WO 01/05804. Analysis of the data has yielded interesting observations of gene expressions that appear to correlate with some toxic effects or mechanisms. See, for example, Mueller et al. Environmental Health Perspectives 106(5): 277-230 (1998). However, there has been very little published work in toxicology so far that applies rigorous analytical and statistical techniques to the massive amounts of data available from genomics technologies. The observations, so far, have tended to be phenomenological and focused on individual gene responses rather than determining the generally applicable capabilities of patterns of gene expression to predict toxic effects (see, for example, studies of gene expression altered by exposure to liver toxicants in Bartosiewicz et al., Environ health Perspectives 109:71-74, 2001;; Huang et al., Tox. Sciences 63: 196-207, 2001). Even in the larger field of biological sciences, these types of analyses are just beginning to be evidenced in the literature (e.g., Golub et al., Science 286: 531-537, 1999).
[07] Recently some work has been published that attempts to correlate gene expression profiles with the mechanism of toxicity of various hepatotoxins. See for example, Waring et al. Tox. and Appl. Pharm. 175:28-42 (2001). However there has been limited success thus far in the attempts to predict toxicity of compounds based on the gene expression profiles elicited upon treatment.
[08] What is needed are genes and predictive models, which are capable of predicting toxicity response.
BRIEF SUMMARY OF THE INVENTION
[09] The present invention provides spleen necrosis predictive genes and predictive models which are useful to predict toxic responses to one or more agents.
[10] One embodiment of the present invention, provides methods of predicting whether an agent induces spleen necrosis in an individual. One method includes the steps of: (a) obtaining a biological sample from an individual treated with the agent or treating a biological sample obtained from an individual with the agent or treating in vitro cultured cells or explants with the agent; (b) obtaining a gene expression profile on one or more of the spleen necrosis predictive genes disclosed herein from the biological sample or in vitro cultured cells or explants; and (c) using the gene expression profile from the biological sample or cells treated with the agent in a predictive model to predict whether the agent induces spleen necrosis in the individual or in vitro.
[11] Another embodiment of the present invention provides for a composition comprising a plurality of cDNAs for use in detecting the altered expression of genes in a toxic response of the spleen, wherein said plurality of cDNAs comprises SEQ ID NOs: 1-306 or the complete complements thereof.
[12] Another embodiment of the present invention provides methods of predicting whether an agent induces toxicity to other lymphoid organs such as bone marrow, thymus, or lymph nodes, in an individual. One method includes the steps of: (a) obtaining a biological sample from an individual treated with the agent or treating a biological sample obtained from an individual with the agent or treating in vitro cultured cells or explants with the agent; (b) obtaining a gene expression profile on one or more of the spleen necrosis predictive genes disclosed herein from the biological sample or in vitro cultured cells or explants; and (c) using the gene expression profile from the biological sample or cells treated with the agent in a predictive model to predict whether the agent induces toxicity to other lymphoid organs such as bone marrow, thymus, or lymph nodes, in an individual.
[13] In one embodiment, the predictive model utilizes gene expression profiles from sets of spleen necrosis predictive gene(s) selected from one of the various spleen necrosis predictive gene sets disclosed herein (i.e., Combination 5, 4, 3, 2, or 1), wherein the sets comprise one or more genes therefrom. In another embodiment, the predictive model utilizes sets of spleen necrosis predictive gene(s) selected from one of the various spleen necrosis predictive gene sets disclosed herein (i.e., Combination 5, 4, 3, 2, or 1), wherein the sets comprise one or more genes therefrom.
[14] In another embodiment, the predictive genes and models are used to identify and evaluate various in vitro systems that can be used to accurately predict in vivo toxicity and to use the identified in vitro systems to accurately predict in vivo toxicity.
[15] Yet another embodiment of the present invention provides methods of identifying a spleen necrosis predictive gene including the steps of: (a) providing a set of candidate toxicity predictive genes; (b) evaluating said genes for their predictive performance with at least one training and test set of data in a Predictive Model to identify genes which are predictive of spleen necrosis; and (c) testing the performance of predictive genes for their ability to predict spleen necrosis for: (i) different test sets of data, (ii) comparison of prediction for accurate versus random classification, and (iii) prediction using test data external to the data used to derive the predictive genes. In one embodiment, the candidate toxicity predictive genes are rat toxicity genes.
[16] Another embodiment of the present invention provides a computer-based method for mining genes predictive for spleen necrosis by: (a) collecting expression levels of a plurality of candidate toxicity predictive genes in a multiplicity of samples; (b) optionally storing the expression levels as a database on an electronic medium; (c) defining a group of samples to be a training set; (d) defining another group of samples to be a test set; (e) optionally generating additional training and test sets; and (f) selecting a set of genes which are predictive of spleen necrosis based on evaluating the training set and the test set in a Predictive Model.
[17] A further embodiment of the present invention provides a computer program product for predicting spleen necrosis, which includes a set of spleen necrosis predictive genes derived from mining a database having a plurality of gene expression profiles indicative of toxicity. In one embodiment, the set of spleen necrosis predictive genes includes at least one predictive gene from combination 5, 4, 3, 2, or 1 list.
[18] Yet a further embodiment of the present invention provides a library of expression profiles of spleen necrosis predictive genes produced by the methods disclosed herein.
[19] Another embodiment of the present invention, provides an integrated system for predicting spleen necrosis including equipment capable of measuring gene expression profiles of spleen necrosis predictive genes from biological samples exposed to a test agent, operably linked to a computer system capable of implementing a predictive model.
BRIEF DESCRIPTION OF THE DRAWINGS
[20] Figure 1 illustrates a flow diagram illustrating a method of how spleen necrosis predictive genes are discovered.
[21] Figure 2 illustrates a flow diagram illustrating a method of how spleen necrosis predictive genes are evaluated for performance.
[22] Figure 3 illustrates a flow diagram illustrating a method of how spleen necrosis predictive genes are used to predict toxicity.
BRIEF DESCRIPTION OF THE TABLES
[23] Table 1 lists compounds, dose levels, spleen necrosis pathology and abbreviations in the database.
[24] Table 2 lists the distribution of compounds in individual training and test sets for 24 hour spleen necrosis data.
[25] Table 3 lists the predictive genes for 24 hour expression data.
[26] Table 4 lists the randomly selected gene subsets from 24 hour Combo All gene set. Genes were randomly selected from the Combo All list of predictive genes (154 genes) assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
[27] Table 5 lists the randomly selected gene subsets from 24 hour Combos 543 combined. Genes were randomly selected from the combined Combo 543 list of predictive genes (23 genes) assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
[28] Table 6 lists the randomly selected gene subsets from 24 hour all excluding predictive genes (i.e,. excluding Combo All genes).
[29] Table 7 lists the spleen necrosis individual sample prediction values for 24 hour data predictive genes (combined list and subsets).
[30] Table 8 lists the individual gene predictions for Combos 5, 4, and 3.
[31] Table 9 lists the comparison of predictivity for correct spleen necrosis classification and random classification using Combo gene sets and random subsets and 24 hour data.
[32] Table 10 lists the distribution of compounds in individual training and test sets for 6 hour spleen necrosis data.
[33] Table 11 lists genes whose expression at 6 hours is predictive of spleen necrosis at 72 hours.
[34] Table 12 lists the comparison of predictivity for correct spleen necrosis classification and random classification using combo gene sets and 6 hour data. Random Classification Using Combo Gene Sets and 6h data
[35] Table 13 lists the distribution of compounds in individual training and test sets for 72 hour spleen necrosis data.
[36] Table 14 lists genes whose expression at 72 hours predicts spleen necrosis at 72 hours.
[37] Table 15 lists comparison of predictivity for correct spleen necrosis classification and random classification using combo gene sets 72 hour data.
[38] Table 16 lists the RCT genes (ESTs) predictive for spleen necrosis at 72 hours: best homology matches.
[39] Table 17 lists the genes predictive for spleen necrosis, sequences, and accession numbers.
[40] Table 18 lists the spleen necrosis predictive genes whose protein products are known to be secreted. The genes are from the table listing the spleen necrosis predictive genes at the three time points 6, 24, and 72 hours.
[41] Table 19 lists the expression data for the 6 hour timepoint.
[42] Table 20 lists the expression data for the 24 hour timepoint.
[43] Table 21 lists the expression data for the 72 hour timepoint.
DETAILED DESCRIPTION
[44] This invention relates to methods of predicting whether an agent or other stimulus induces spleen toxicity using predictive molecular toxicology analysis. In particular, the present invention provides methods of predicting spleen necrosis which comprise analyzing gene and/or protein expression across a number of spleen necrosis biomarkers disclosed herein for patterns of expression that are predictive of spleen necrosis in the recipient organism. Many chemical agents induce damage to spleen and other lymphoid organs, and immunotoxicity is a significant component of adverse reactions to pharmaceuticals and drugs. Adverse drug reactions are very often unpredictable, and may occur through acute exposure to the chemical agent or drug or through chronic exposures. One embodiment of the present invention provides in part, that modulated transcriptional regulation of relatively small sets of certain genes in response to a test agent can accurately predict the occurrence of spleen necrosis observed at later time points.
[45] Provided herein are multiple sets of spleen necrosis biomarkers which are useful in the practice of the spleen toxicity prediction methods of the present invention. In particular, applicants have identified 306 spleen necrosis biomarkers which demonstrate utility in predicting spleen necrosis. These biomarkers have been thoroughly characterized for their predictive performance, individually as well as in various combinations or subsets thereof. In addition, various optimized subsets of the spleen necrosis biomarkers of the present invention are disclosed. These sets have also been thoroughly characterized for predictive performance using the methods of the present invention. Among the subsets of spleen necrosis genes provided herein are several which demonstrate prediction accuracies in the vicinity of 80%.
[46] The present invention is further described by way of the experimental examples provided herein. These examples demonstrate that small sets of genes are used to predict spleen necrosis. As further described in the Examples, analysis of mRNA expression of a few genes indicates whether a test agent will induce spleen necrosis.
[47] The predictive capacity of the methods of the present invention have been verified by comparisons with random classifications.
[48] I. General Techniques: The practice of the present invention will employs, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry, nucleic acid chemistry, and immunology, which are well known to those skilled in the art. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, second edition (Sambrook et al., 1989) and Molecular Cloning: A Laboratory Manual, third edition (Sambrook and Russel, 2001), (jointly referred to herein as "Sambrook"); Current Protocols in Molecular Biology [FM. Ausubel et al., eds., 1987, including supplements through 2001 ); PCR: The Polymerase Chain Reaction, (Mullis et al., eds., 1994); Harlow and Lane ^988) Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York; Harlow and Lane (1999) Using Antibodies: A Laboratory Manual Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (jointly referred to herein as "Harlow and Lane"), Beaucage et al. eds., Current Protocols in Nucleic Acid Chemistry John Wiley & Sons, Inc., New York, 2000) and Casarett and Doull's Toxicology The Basic Science of Poisons, C. Klaassen, ed., 6th edition (2001 ).
[49] Unless otherwise defined, all terms of art, notations and other scientific terminology used herein are intended to have the meanings commonly understood by those of skill in the art to which this invention pertains. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over what is generally understood in the art. The techniques and procedures described or referenced herein are generally well understood and commonly employed using conventional methodology by those skilled in the art, such as, for example, the widely utilized molecular cloning methodologies described in Sambrook et al., Molecular Cloning: A Laboratory Manual 2nd edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. As appropriate, procedures involving the use of commercially available kits and reagents are generally carried out in accordance with manufacturer defined protocols and/or parameters unless otherwise noted.
[50] "Toxic" or "toxicity" refers to the result of an agent causing adverse effects, usually by a xenobiotic agent administered at a sufficiently high dose level to cause the adverse effects.
[51] The term "spleen necrosis" refers to depletion of lymphoid tissue in the splenic follicles.
[52] As used herein, the terms "spleen necrosis biomarker" and "spleen necrosis predictive gene" are used interchangeably and refer to a gene whose expression, measured at the RNA or protein level can predict the likelihood of a spleen necrosis response.
[53] A "toxicological response" refers to a cellular, tissue, organ or system level response to exposure to an agent. At the molecular level, this can include, but is not limited to, the differential expression of genes encompassing both the up- and down-regulation of expression of such genes at the RNA and/or protein level; the up- or down-regulation of expression of genes which encode proteins associated with response to and mitigation of damage, the repair or regulation of cell damage; or changes in gene expression due to changes in populations of cells in the tissue or organ affected in response to toxic damage.
[54] An "agent" or "compound" is any element to which an individual can be exposed and can include, without limitation, drugs, pharmaceutical compounds, household chemicals, industrial chemicals, environmental chemicals, other chemicals, and physical elements such as electromagnetic radiation.
[55] The term "biological sample" as used herein refers to substances obtained from an individual. The samples may comprise cells, tissue, parts of tissues, organs, parts of organs, or fluids (e.g., blood, urine or serum). Biological samples include, but are not limited to, those of eukaryotic, mammalian or human origin.
[56] "Sample" is defined for the purposes of prediction as a biological sample and the gene expression data for that sample. Each sample may come from an individual animal. A toxicity classification may also be associated with the sample.
[57] "Gene expression" as used herein refers to the relative levels of expression and/or pattern of expression of a gene. [58] "Gene expression profile" refers to the levels of expression of multiple different genes measured for the same sample. Gene expression profiles may be measured in a sample, such as samples comprising a variety of cell types, different tissues, different organs, or fluids (e.g., blood, urine, spinal fluid, sweat, saliva or serum) by various methods including but not limited to microarray technologies and quantitative and semi-quantitative RT-PCR (e.g., Taqman™) techniques, as well as techniques for measuring expression of proteins.
[59] "Individual" refers to a vertebrate, including, but not limited to, a human, non- human primate, mouse, hamster, guinea pig, rabbit, cattle, sheep, pig, chicken, and dog.
[60] As used herein, the terms "hybridize", "hybridizing", "hybridizes" and the like, used in the context of polynucleotides, are meant to refer to conventional hybridization conditions, such as hybridization in 50% formamide, 6X SSC, 0.1% SDS, 100 μg/ml ssDNA, in which temperatures for hybridization are above 37 degrees Celsius and temperatures for washing in 0.1 X SSC/0.1% SDS are above 55 degrees Celsius, and preferably to stringent hybridization conditions. The hybridization of nucleic acids can depend upon various factors such as their degree of complementarity as well as the stringency of the hybridization reaction conditions. Stringent conditions can be used to identify nucleic acid duplexes with a high degree of complementarity. Means for adjusting the stringency of a hybridization reaction are well-known to those of skill in the art. See, for example, Sambrook, et al, "Molecular Cloning: A Laboratory Manual," Second Edition, Cold Spring Harbor Laboratory Press, 1989; Ausubel, et al., "Current Protocols In Molecular Biology," John Wiley & Sons, 1996 and periodic updates; and Hames et al, "Nucleic Acid Hybridization: A Practical Approach," IRL Press, Ltd., 1985. In general, conditions that increase stringency (i.e., select for the formation of more closely matched duplexes) include higher temperature, lower ionic strength and presence or absence of solvents; lower stringency is favored by lower temperature, higher ionic strength, and lower or higher concentrations of solvents.
[61] In the context of amino acid sequence comparisons, the term "identity" is used to express the percentage of amino acid residues at the same relative position which are the same. Also in this context, the term "homology" is used to express the percentage of amino acid residues at the same relative positions which are either identical or are similar, using the conserved amino acid criteria of BLAST analysis, as is generally understood in the art. Further details regarding amino acid substitutions, which are considered conservative under such criteria, are provided below.
[62] The generation of Toxicology Gene Expression Databases or spleen necrosis biomarkers is described. The spleen necrosis biomarkers described herein were initially identified utilizing a database generated from large numbers of in vivo experiments, wherein the differential expression of approximately 700 rat genes, measured at various time points, in response to multiple toxic compounds inducing various specific toxic responses, as visualized through microscopic histopathological analysis, was quantified, as described in pending United States Patent Application filed January 29, 2002 (serial number 10/060,893) and for purposes of U.S. patent prosecution is incorporated by reference This quantitative gene expression data, as well as corresponding histopathological information, was then subjected to an analytical approach specifically designed to identify genes which not only correlated with the observed histopathology, but also demonstrated an ability to be used in a model capable of accurately predicting the occurrence of the toxic response associated with the observed histopathology. A detailed description of the identification process is provided in the Examples. A flow diagram illustrating a method of how the spleen necrosis biomarkers described in one embodiment of the present invention were identified and is illustrated in Figure 1.
[63] In addition to the database described and utilized herein, other toxicology gene expression databases are generated, and used to identify additional toxicity biomarkers, which may also be employed in the practice of the spleen necrosis prediction methods as described in other embodiments of the invention. Such databases are generated with test compounds capable of inducing various pathologies indicative of a toxic response in the spleen and/or other lymphoid organs or systems, over different time periods and under different administration and/or dosing conditions, including without limitation bone marrow . An example of compounds, dose levels, spleen necrosis classifications and histopathology scores used in the Examples which follow are provided in Table 1. The notations used in Table 1 are described as follows: * Compound and Dose Level abbreviations ** Histopathology Spleen Necrosis scores. 1= not remarkable; 2 and higher indicate histopathology of increasing severity.
[64] Such databases are generated using organisms other than the rat, including without limitation, animals of canine, murine, or non-human primate species. In addition, such databases may incorporate data derived from human clinical trials and post-approval human clinical experiences. Various methods for detecting and quantitating the expression of genes and/or proteins in response to toxic stimuli are employed in the generation of such databases, as are generally known in the art. For example, microarrays comprising multiple cDNAs or oligonucleotide probes capable of hybridizing to corresponding transcripts of genes of interest may be used to generate gene expression profiles. Additionally, a number of other methods for detecting and quantitating the expression of gene transcripts are known in the art and may be employed, including without limitation, RT-PCR techniques such as TaqMan®, RNAse protection, branched chain, etc.
[65] Databases comprising quantitative gene expression information preferably include qualitative and quantitative and/or semi-quantitative information respecting the observed toxicological responses and other conventional toxicology endpoints, such as for example, body and organ weights, serum chemistry and histopathology observations, histopathology scores and/or similar parameters.
[66] Identification of Correlating Genes: For the purpose of identifying candidate predictive genes, the database preferably includes histopathology scores for each animal which has been exposed to one or more agent(s). These scores can be assigned based on actual histopathology observations for the tissue and animal or on the basis of effects observed for other animals treated with the same agent and dose level. The scores are numerical scores that reflect the occurrence and severity of histopathological changes. These scores can be adjusted to have similar range to gene expression changes. For example, a score of 1 could be assigned to samples with no changes and scores of 2-8 assigned to increasingly severe changes. Because the scores are numerical, they are suitable for use with a variety of statistical correlation and similarity measures.
[67] An example of histopathology scoring systems are provided in Example 1. Referring to Figure 1 , histopathology scores may be utilized to identify genes which correlate with the observed toxicological response, using any number of statistical correlation and similarity analysis techniques, including without limitation those correlation or similarity measures described or employed in Example 1 (e.g., Pearson, Spearman, change, smooth, distance etc.). Such correlating genes may be used as predictive gene candidates. In one embodiment, the correlating gene lists as well as the entire array gene list are used as input gene lists in the GeneSpring™ (Version 4.1 , Silicon Genetics, Redwood City, CA) Predict Parameter Values tool (otherwise known hereafter as "Predictive Model"). In a preferred embodiment, the correlating gene lists as well as the entire array gene list are used as input gene lists in the Predictive Model as implemented in S- PLUS® statistical software (Insightful Corporation, Seattle, WA).
[68] Class Prediction and Classification: Statistical analysis of the database of gene expression profiles can be affected by utilizing commercially available software programs. One embodiment of the present invention provides for the utilization of S-PLUS® software. Other software programs which can be used for statistical analysis are GeneSpring™ and SAS software packages (SAS Institute Inc., Gary, NC). Using S-PLUS® software, class predictions can be made from the genes in the database, as detailed in Example 1 , using one or more training and test sets. In one embodiment, five training sets and five test sets are obtained, as shown in Example 1 (Table 2). Spleen necrosis classifications are entered for the samples in each training and test set. Toxicological classifications can be defined by the presence or the absence of various pathologies. In another embodiment, toxicity observed as spleen necrosis is defined as 2 classifications (i.e. spleen necrosis vs. no spleen necrosis)) observed 72 hours after treatment with an agent. However, toxicity can manifest in other spleen pathologies such as congestion or hyperplasia. In other embodiments of the present invention, more complex (three or more) classifications can be used in defining multiple pathologies. Referring to Table 2, the notations appearing are as follows: + Low= low dose, High= high dose, * For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.), ** Negative= Compounds that did not elicit histopathology (score=1), Positive= Compounds that did elicit histopathology (score of 2 or greater)
[69] Once the training sets have been selected, then predicted classifications of the test set samples are obtained by using /.-nearest neighbor (or knn) voting procedure. The class in which each of the knn is determined and the test sample is assigned to the class with the largest representation after adjusting for the proportion of classifications in the training set. In one embodiment, adjustments are made to account for different proportions of classes in the training set.
[70] Toxicity can also be observed at various early time points after exposure to an agent up to and including 72 hours after treatment. A skilled toxicologist can determine the optimal time after exposure to an agent to observe pathology by either what has been disclosed in the art or a stepwise experimentation with time increments, for example 2, 4, 6, 12, 18, 24, 36, 48 hours post-exposure or even longer time increments, for example, days, weeks, or months after exposure to the agent.
[71] Identification of Predictive Genes: Figure 1 illustrates the overall process used to identify spleen necrosis predictive genes. In one embodiment, this process runs independently for each time point.
[72] The number of input genes that are to be used in the Predictive /Model can be varied, for example 50, 40, 30, 20, 10, 5, 2, or 1 gene(s) can be used. One embodiment of the present invention provides that at least 50 genes are used. In another embodiment, all genes in the input list are used.
[73] Optimal gene lists (i.e., the list which results in the best predictive accuracy with the lowest number of genes used) are generated. One embodiment of the present invention provides that optimum gene lists for all input gene lists are combined for each training and test set and then these combined lists for all five training and test sets are merged to create an aggregate list of predictive genes. The aggregate list can then be subdivided to smaller lists of genes based on the number of times that the genes occurred on the predictive gene lists for an individual training or test set. The resulting gene lists are designated herein as Combo 5, 4, 3, 2, or 1 lists. The genes that were predictive in all 5 training and test sets are designated as Combo 5 and the genes that were predictive in 4 of 5 training and test sets are designated as Combo 4 and so forth.
[74] Table 17 lists gene names, accession numbers and sequence information for the spleen necrosis predictive genes found by analysis of the database in the manner described above. The present invention demonstrates that each of these genes contributes to predictive performance for at least one input gene list and training/test set and one time point. Table 16 lists homologous genes for the RCT sequences that were identified by BLAST search using the GenBank NR database as the target database. Homologies are given from BLAST searches using the Phase 1 RCT sequence as the query sequence and GenBank NR database as the target sequence database. The best BLAST homology sequence observed is given. In general, no significant homology indicates that no BLAST match was observed with a BIT score >100.
[75] The predictive genes are evaluated for predictive performance as shown in Figure 2 for spleen necrosis. Expression data that can be used with the /.-nearest neighbor model and predictive genes to enable one skilled in the art to make predictions are given in Tables 19-21.
[76] The combined lists of predictive genes or alternatively, Combo 5, 4, 3, 2, or 1 list or subsets thereof are used as input into the Predictive Model. As an external verification of the predictive abilities of the genes found to be predictive for spleen necrosis, random lists of genes may be generated and also used as input into the Predictive Model. Example 2 describes the evaluation of the predictive performance of the spleen necrosis predictive genes.
[77] Predictive performance may also be assessed using data from different time points after exposure to the agent. In one embodiment, 24 hour expression data are used. In another embodiment, 6 hour expression data are used, as described in Examples 3 and 4. In another embodiment, 72 hour expression data are used, as described in Example 5 and 6. As shown in Table 7, the predictive accuracy using 24 hour expression data and the largest predictive gene list is 81 %.
[78] Somewhat lower predictive accuracies were observed for the 6h and 72 h data. All of the combo lists as well as Combo All list had significantly higher accuracy than using random classifications.
[79] Predictive performance were assessed using subsets of genes from the different Combo lists. As indicated in Example 2, most randomly selected subsets of the Combo gene lists yielded predictive accuracies of 70% or greater and even individual genes had geometric means that were often greater than 60%. In one embodiment, using 10 genes from Combos 5, 4, and 3 yields an accuracy in the range of 81-84%. Using different Combo lists may require a greater number of genes to reach the same accuracy level.
[80] The spleen necrosis predictive genes disclosed herein and spleen necrosis predictive genes identified by using methods disclosed herein are useful for predicting spleen necrosis in response to exposure to one or more agents.
[81] The discovery that relatively small sets of different genes have predictive value permits flexible applications. The choice of how many and which genes to use can be tailored to a variety of different purposes. Predictivities are observed for sets of a few genes. These small sets may be particularly advantageous in applications where measurement of only a few RNA species has considerable advantages in terms of sample processing logistics, speed and cost. These' applications would include relatively high throughput screens for predictive capability. An example of this would be an early screen using small samples of primary cells or cultured cell lines that can be processed with automated robotic equipment for treatment and isolation of RNA followed by efficient technologies for measuring expression of a few RNA species such as branched chain technology or RT-PCR.
[82] The use of larger numbers of predictive genes provides redundancy which may improve accuracy and precision. Applications using larger numbers of predictive genes may include, for example, tests of drug candidates at later stages of commercial development. In this regard, larger numbers of predictive genes may be desirable at later stages of preclinical development of a therapeutic candidate, where in vivo samples can be obtained and more comprehensive methods such as microarray measurement of gene expression are appropriate. The larger gene sets can also include different subsets of genes which may offer more insight into potential mechanisms of toxicity, providing the potential to predict long term toxic consequences such as chronic, irreversible toxicity or carcinogenicity.
[83] Some genes within the spleen necrosis predictive gene sets provided herein are also suitable for prediction of toxicity in other organs or may be preferable for predicting toxicity for wider ranges of timepoints or treatment routes or regimens. As an example of the latter, some of the predictive genes are observed at three different timepoints after treatment. These genes are useful for prediction in cases where the samples come from treatment protocols that have different measurement timepoints or routes of administration than those employed for the database used in the discovery of the predictive genes disclosed herein or where the toxicokinetics for a particular agent are known or suspected to be different from those in the database.
[84] One embodiment of the present invention provides that the agent is an agent for which no expression profile has been assessed or stored in the database or library. Animal, e.g., rat, are dosed with such an agent and the gene expression profile(s) are the test set for the Predictive Model. The training set which is used in the Predictive Model in this case can be the entire database of sample array data because the test set data is not present in the database. The prediction can be made with accuracy without the use of histopathology scores as part of the input into the Predictive Model.
[85] In another embodiment the agent is an agent present in the database but is used at a different dose level or with a different treatment protocol than used in the database. The training set which is used in the Predictive Model in this case can be the entire database of sample array data because the test set data is not present in the database. Again, the prediction can be made with accuracy without the use of histopathology scores as part of the input into the Predictive Model.
[86] In another embodiment, the exposure time of the agent is other than 6, 24, or 72 hours, or repeat dosing protocols are used. In this case, the skilled artisan can use the predictive toxicity genes from surrounding time points to extrapolate the predicted toxicity without undue experimentation. For example, if the individual has been exposed to the agent for 12 hours, then predictive genes from 6 and 24 hours timepoints are used as guidelines for extrapolating toxicity predictions.
[87] In another embodiment, the spleen necrosis predictive genes are used to detect toxic effects that are manifested as long lasting or chronic consequences such as irreversible toxicity or carcinogenesis. The predictive genes and model can be applied to databases where classifications of training and test set samples are made with respect to actual or putative endpoints such as irreversible toxicity or carcinogenicity.
[88] In another embodiment, the predictive genes are used in a variety of alternative models to predict spleen necrosis. Some of these models do not require the direct use of data in a database but use functions or coefficients derived from the database. In another embodiment, the predictive genes and models are used to evaluate in vitro systems for their ability to reflect in vivo toxic events and to use such in vitro systems for predicting in vivo toxicity. Expression profiles for predictive genes are created from candidate in vitro assays using treatments with agents of known in vivo toxicity and for which in vivo data on gene expression are available. The expression data and predictive models of this invention are used to determine whether the in vitro assay system has predictive gene expression responses that accurately reflect the in vivo situation. Large sets of predictive genes as described in the present invention are tested in such models for their suitability and performance with the candidate in vitro systems. The present invention provides for a superior and novel tool for evaluating and optimizing in vitro systems for their ability to reflect and accurately predict in vivo responses.
[89] Another embodiment of the present invention provides that the predictive genes and models are used with an in vitro system to accurately predict in vivo toxicity. In vitro systems that have been evaluated and optimized as described above are treated with test agents and expression profiles are measured for predictive genes. The expression profiles are used in conjunction with a predictive model to predict in vivo toxicity. In this embodiment, there can be considerable reduction in the use of laboratory animals. Additionally the application of this embodiment to in vitro human systems can provide a unique capability to accurately predict human toxic responses without human in vivo exposure or treatment.
[90] Another embodiment of the present invention provides that measurement of the expression levels of the proteins encoded by the predictive genes can be used in conjunction with predictive models to predict toxicity. Among the full set of spleen necrosis predictive genes are various genes known to encode cell surface, secreted and/or shed proteins. This enables the development of methods for predicting toxicity using protein biomarkers. For example, as disclosed in Table 18, there are 28 genes in the master predictive set which are known to encode secreted proteins. The protein products are easier to access since they are secreted into body fluids and are thus more amenable to be quantified. Thus, another embodiment of the present invention provides a spleen necrosis predictive assays which detects the expression of one or more of the predictive proteins. Such assays have several advantages, such as:
[91] Ability to use archived tissue specimens such as preserved or embedded tissues which are not suitable for measurement of RNA expression
[92] Ability to examine predictive protein expression in tissue slides using in situ labeling and microscopic observation. This is useful for detecting predictive toxicity signals occurring in very small sub-populations of cells.
[93] Ability to detect protein markers in specimens that can be readily obtained with little or no invasiveness (e.g., blood, urine, sweat, saliva).
[94] Reduction in animal use in laboratory studies such that no sacrifice of animals necessary to obtain tissue specimens when toxicity prediction can be made with specimens that can be obtained without animal sacrifice or surgery.
[95] Application for human use where tissue specimens cannot be obtained or are only obtained with great difficulty.
[96] In another embodiment, the identified predictive genes are potential therapeutic targets when the genes are involved in toxic damage or repair responses whose expression or functional modification may attenuate, ameliorate or eliminate disease conditions or adverse symptoms of disease conditions.
[97] In another embodiment the predictive genes are organized into clusters of genes that exhibit similar patterns of expression by a variety of statistical procedures commonly used to identify such coordinate expression patterns. Common functional properties of these clustered genes are used to provide insight into the functional relationship of the response of these genes to toxic effects. Common genetic properties of these genes (e.g., common regulatory sequences) provide insight into functional aspects by revealing known or novel similarities in the coding region of the genes. The presence of common known or novel signal transduction systems that regulate expression of the genes can also provide functional insight. The presence of common known or novel regulatory sequences in the identified predictive genes are also used to identify additional spleen necrosis predictive genes.
[98] In another embodiment, the predictive genes are used to predict toxicity responses in other lymphoid tissue types (i.e. bone marrow, thymus, and lymph nodes).
[99] In yet another embodiment, the spleen necrosis predictive genes are used to predict toxicity responses in other species, for example, human, non-human primate, mouse, hamster, guinea pig, hamster, rabbit, cattle, sheep, pig, chicken, and dog. Some members of the spleen necrosis predictive genes may be more suitable for prediction of toxicity in species other than the species used to derive the database (rat in the case of the examples provided). One method for identifying such genes involves examining DNA sequence databases to identify and characterize orthologous sequences to the predictive genes in the target species. One of skill in the art can examine the orthologous sequences for similarity in amino acid coding regions and motifs as well as for similarities in regulatory regions and motifs of the gene.
[100] In another embodiment, spleen necrosis predictive genes or gene sequences are used for screening other potential toxicity predictive genes or gene sequences in other species or even within the same species using methods known in the art. See, for example, Sambrook supra. Gene sequences which hybridize under stringent conditions to the spleen necrosis predictive gene sequences disclosed herein are selected as potential toxicity predictive genes. Additionally, genes which demonstrate significant homology with the spleen necrosis predictive genes disclosed herein (preferably at least about 70%) are selected as toxicity predictive gene candidates. It is understood that conservative substitutions of amino acids are possible for gene sequences which have some percentage homology with the spleen necrosis predictive gene sequences of this invention. A conservative substitution in a protein is a substitution of one amino acid with an amino acid with similar size and charge. Groups of amino acids known normally to be equivalent are: (a) Ala, Ser, Thr, Pro, and Gly; (b) Asn, Asp, Glu, and Gin; (c) His, Arg, and Lys; (d) Met, Glu, lie, and Val; and (e) Phe, Tyr, and Trp.
[101] The predictive spleen necrosis genes can be used as guides to predicting toxicity for agents that have been administered via different routes (intraperitoneal, intravenous, oral, dermal, inhalation, mucosal, etc.) from the routes that were used to generate the database or to identify the spleen necrosis predictive genes. Furthermore, the present invention is not intended to be limiting to agents that have been administered at different dosages than the agents that were used to generate the database or to identify the predictive spleen necrosis genes.
[102] Data described in the examples were generated using the microarray technology disclosed in the Examples. However, the present invention is not dependent on using this particular platform. Other similar gene expression analysis technologies are incorporated in the practice of this invention. These can include, but are not limited to, other arrays containing the predictive genes, RT- PCR (e.g., TaqMan®), branched chain technology, RNAse protection or any other method which quantitatively detects the expression of RNA polynucleotides. The present invention can be practiced using these other technologies by generating a database of expression measurements for the predictive genes using samples such as those used in the database described in Example 1. This database can then be used in a model such as the /c-nearest neighbor model or can be used to develop any of a number of other models.
EXAMPLES
[103] The following Examples are provided to illustrate but not to limit the present invention in any manner.
Example 1
[104] Identification of spleen necrosis predictive genes Materials and Methods: (A) Database of Compounds and Spleen Necrosis
[105] Compounds and treatments list used to construct the spleen necrosis database are given in Table 1. This table also provides the evaluation of the spleen necrosis observed in samples collected 72 hours after treatment.
[106] Database of Animal Experiments: Sprague Dawley rats Crl:CD from Charles River, Raleigh, NC were divided into treated rats that receive a specific concentration of the compound (see Table 1 ) and the control rats that only received the vehicle in which the compound is mixed (e.g., saline).
[107] At specified timepoints (6h, 24h and 72h) after administration (intraperitoneal route) of the compound, a set number of rats (usually 3 control and 3 treated) were euthanized and tissues collected. Each rat was heavily sedated with an overdose of C02 by inhalation and a maximum amount of blood drawn. Exsanguination of the rat by this drawing of blood kills the rat. The method of collecting the tissues is very important and ensures preserving the quality of the mRNA in the tissues. The body of the rat was then opened up and prosectors rapidly removed the tissues (including spleen) and immediately placed them into liquid nitrogen. All of the organs/tissues were frozen within 3 minutes of the death of the animal to ensure that mRNA did not degrade. The organs/tissues were then packaged into well-labeled plastic freezer quality bags and stored at -80 degrees until needed for isolation of the mRNA from a portion of the organ/tissue sample.
[108] Isolating DNA/RNA from animal tissues or cells: Total RNA was isolated from spleen tissue samples using the following materials: Qiagen RNeasy midi kits, 2- mercaptoethanol, liquid N2, tissue homogenizer, dry ice Samples were kept on ice when specified.
[109] If a tissue needed to be broken, then the tissue sample was placed on a double layer of aluminum foil which was then placed within a weigh boat containing a small amount of liquid nitrogen. The aluminum foil was folded around the tissue and then struck by a small foil-wrapped hammer to administer mechanical stress forces.
[110] About 0.10 g of spleen tissue was weighed out and placed in a sterile container. To preserve integrity of the RNA, tissues were kept on dry ice when other samples were being weighed. A RLT (Qiagen®) buffer was added to the sample to aid in the homogenization process. The tissue was homogenized using commercially available homogenizer ( IKA Ultra Turrax T25 homogenizer) with the 7 mm microfine sawtooth shaft and generator (195 mm long with a processing range of 0.25 ml to 20 ml, item # 372718). After homogenization, samples were stored on ice until samples were homogenized. The homogenized tissue sample was spun to remove nuclei thus reducing DNA contamination. The supernatant of the lysate was then transferred to a clean container containing an equal volume of 70% EtOH in DEPC treated H20 and mixed. RNA was isolated by putting the supernatant through an RNeasy spin column, washed, and subsequently eluted. Small quantities of remaining DNA were removed by use of DNase enzyme during the RNA isolation procedure following the instructions provided by Qiagen and alternatively by lithium chloride (LiCl) precipitation following the RNA isolation. The isolated RNA pellet was stored in Rnase-free water or in an RNA storage buffer (10 mM sodium citrate), Ambion Cat #7000. The RNA amount was then quantitated using a spectrophotometer.
[111] Rat 700 CT chip: Gene expression data was generated from a microarray chip that has a set of toxicologically relevant rat genes which are used to predict toxicological responses. The rat 700 CT gene array is disclosed in pending U.S. applications 60/264,933; 60/308,161 ; and pending application filed on January 29, 2002 (serial number 10/060,893).
[112] Microarray RT reaction: Fluorescence-labeled first strand cDNA probe was Made from the total RNA or mRNA isolated from spleens of control and treated rats. This probe was hybridized to microarray slides spotted with DNA specific for toxicologically relevant genes. The materials needed are: total or messenger RNA, primer, Superscript II buffer, dithiothreitol (DTT), nucleotide mix, Cy3 or Cy5 dye, Superscript II (RT), ammonium acetate, 70% EtOH, PCR machine, and ice. The volume of each sample that would contain 20μg of total RNA (or 2μg of mRNA) was calculated. The amount of DEPC water needed to bring the total volume of each RNA sample to 14 μl was also calculated. If RNA was too dilute, the samples were concentrated to a volume of less than 14 μl in a speedvac without heat. The speedvac must be capable of generating a vacuum of 0 Milli- Torr so that samples can freeze dry under these conditions. Sufficient volume of DEPC water was added to bring the total volume of each RNA sample to 14 μl. Each PCR tube was labeled with the name of the sample or control reaction. The appropriate volume of DEPC water and 8 μl of anchored oligo dT mix (stored at - 20°C) was added to each tube.Then the appropriate volume of each RNA sample was added to the labeled PCR tube. The samples were mixed by pipeting. The tubes were kept on ice until samples were ready for the next step. It is preferable for the tubes to be kept on ice until the next step is ready to proceed. The samples were incubated in a PCR machine for 10 minutes at 70°C followed by 4°C incubation period until the sample tubes were ready to be retrieved. The sample tubes were left at 4°C for at least 2 minutes. The Cy dyes are light sensitive, so any solutions or samples containing Cy-dyes should be kept out of light as much as possible (e.g., cover with foil) after this point in the process. Sufficient amounts of Cy3 and Cy5 reverse transcription mix were prepared for one to two more reactions than would actually be run by scaling up the following:
[113] For labeling with Cy3, 8 μl 5x First Strand Buffer for Superscript II, 4 μl 0.1 M DTT, 2 μl Nucleotide Mix, 2 μl of 1:8 dilution of Cy3 (e.g.,, 0.125mM cy3dCTP), 2 μl Superscript II, For labeling with Cy5, 8 μl 5x First Strand Buffer for Superscript II, 4 μl 0.1 M DTT, 2 μl Nucleotide Mix, 2 μl of 1 :10 dilution of Cy5 (e.g.,, 0.1mM CyδdCTP), and 2 μl Superscript II. About 18 μl of the pink Cy3 mix was added to each treated sample and 18 μl of the blue Cy5 mix was added to each control sample. Each sample was mixed by pipeting. The samples were placed in a DNA engine (PTC-200 Petier Thermal Cycler, MJ Research) for 2 hours at 45°C followed by 4°C until the sample tubes were ready to be retrieved.
[114] In addition to the desired cDNA product, the RT reaction contained impurities that must be removed. These impurities included excess primers, nucleotides, and dyes. The primary method of removing the impurities was by following the instructions in the QIAquick PCR purification kit (Qiagen cat#120016).
[115] Alternatively, the RT reactions were cleaned of impurities by ethanol precipitation and resin bead binding. The samples from DNA engine were transferred to Eppendorf tubes containing 600 μl of ethanol precipitation mixture and placed in -80°C freezer for at least 20-30 minutes. These samples were centrifuged for 15 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C) and carefully the supernatant was decanted. A visible pellet was seen (pink/red for Cy3, blue for Cy5). Ice cold 70% EtOH (about 1 ml per tube) was used to wash the tubes and the tubes were subsequently inverted to clean tube and pellet. The tubes were centrifuged for 10 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C), then the supernatant was carefully decanted. The tubes were air dried for about 5 to 10 minutes, protected from light. When the pellets were dried, they were resuspended in 80 μl nanopure water. The cDNA/mRNA hybrid was denatured by heating for 5 minutes at 95°C in a heat block and flash spun. Then the lid of a "Millipore MAHV N45" 96 well plate was labeled with the appropriate sample numbers. A blue gasket and waste plate (v-bottom 96 well) was attached. About 160 μl of Wizard DNA Binding Resin (Promega cat#A1151 ) was added to each well of the filter plate that was used. Probes were added to the appropriate wells (80 μl cDNA samples) containing the Binding Resin. The reaction is mixed by pipeting up and down -10 times. The plates were centrifuged at 2500 rpm for 5 minutes (Beckman GS-6 or equivalent) and then the filtrate was decanted. About 200 μl of 80% isopropanol was added, the plates were spun for 5 minutes at 2500 rpm, and the filtrate was discarded. Then the 80% isopropanol wash and spin step was repeated. The filter plate was placed on a clean collection plate (v- bottom 96 well) and 80 μl of Nanopure water, pH 8.0-8.5 was added. The pH was adjusted with NaOH. The filter plate was secured to the collection plate and after 5 minutes was centrifuged for 7 minutes at 2500 rpm.
[116] Purification of Cy -Dye Labeled Cdna: To purify fluorescence-labeled first strand cDNA probes, the following materials were used: Millipore MAHV N45 96 well plate, v-bottom 96 well plate (Costar), Wizard DNA binding Resin, wide orifice pipette tips for 200 to 300 μl volumes, isopropanol, nanopure water. It is preferable to keep the plates aligned during centrifugation. Misaligned plates lead to sample cross contamination and/or sample loss. It is also important that plate carriers are seated properly in the centrifuge rotor.
[117] The lid of a "Millipore MAHV N45" 96 well plate was labeled with the appropriate sample numbers. A blue gasket and waste plate (v-bottom 96 well) was attached. Wizard DNA Binding Resin (Promega cat#A1151) was shaken immediately prior to use for thorough resuspension. About 160 μl of Wizard DNA Binding Resin was added to each well of the filter plate that was used. If this was done with a multi-channel pipette, wide orifice pipette tips would have been used to prevent clogging. It is preferable not to touch or puncture the membrane of the filter plate with a pipette tip. Probes were added to the appropriate wells (80 μl cDNA samples) containing the Binding Resin. The reaction is mixed by pipeting up and down -10 times. It is preferable to use regular, unfiltered pipette tips for this step. The plates were centrifuged at 2500 rpm for 5 minutes (Beckman GS-6 or equivalent) and then the filtrate was decanted. About 200 μl of 80% isopropanol was added, the plates were spun for 5 minutes at 2500 rpm, and the filtrate was discarded. Then the 80% isopropanol wash and spin step was repeated. The filter plate was placed on a clean collection plate (v-bottom 96 well) and 80 μl of Nanopure water, pH 8.0-8.5 was added. The pH was adjusted with NaOH. The filter plate was secured to the collection plate with tape to ensure that the plate did not slide during the final spin. The plate sat for 5 minutes and was centrifuged for 7 minutes at 2500 rpm. Replicates of samples should be pooled.
[118] Dry-down Process: The cDNA probes were concentrated so that they can be resuspended in hybridization buffer at the appropriate volume. The volume of the control cDNA (Cy-5) was measured and divided by the number of samples to determine the appropriate amount to add to each test cDNA (Cy-3). Eppendorf tubes were labeled for each test sample and the appropriate amount of control cDNA was allocated into each tube. The test samples (Cy-3) were added to the appropriate tubes. These tubes were placed in a speed-vac to dry down, with foil covering any windows on the speed vac. At this point, heat (45°C) may be used to expedite the drying process. Samples are saved and may be saved in dried form at -20°C for up to 14 days.
[119] Microarray Hybridization: To hybridize labeled cDNA probes to single stranded, covalently bound DNA target genes on glass slide microarrays, the following material were used: formamide, SSC, SDS, 2 μm syringe filter, salmon sperm DNA (Sigma, cat # D-7656), human Cot-1 DNA (Life Technologies, cat # 15279-011), poly A (40 mer: Life Technologies, custom synthesized), yeast tRNA (Life Technologies, cat # 15401-04), hybridization chambers, incubator, coverslips, parafilm, heat blocks. In one embodiment, the array is covered to ensure proper hybridization.
[120] About 30 μl of hybridization buffer was prepared per cDNA sample (control rat cDNA plus treated rat cDNA). Slightly more than is needed should be made since about 100 μl of the total volume made for hybridizations can be lost during filtration. The amounts of the ingredients for Hybridization buffer in any volume is given as well as the amounts for preparing 100 μl of buffer: 50% Formamide, 50 μl formamide; 5X SSC, 25 μl 20X SSC; and 0.1% SDS, 25 μl 0.4% SDS; The solution was filtered through 0.2 μm syringe filter, then the volume was measured. About 1 μl of salmon sperm DNA (10mg/ml) was added per 100 μl of buffer.
[121] Alternatively, the hybridization buffer was made up as: 50% Formamide, 50 μl formamide; 10X SSC, 50 μl 20X SSC; 0.2% SDS, 1 μl 20% SDS. [122] The solution was filtered through 0.2 μm syringe filter, then the volume was measured. One microliter of salmon sperm DNA (9.7mg/ml), 0.5 μl Human Cot-1 DNA (5 μg/μl), 0.5 μl poly A (5 μg/μl), 0.25 μl Yeast tRNA (10 μg/μl) was added per 100 μl of buffer. The hybridization buffers were compared in validation studies and there was no change in differential gene expression data between the two buffers.
[123] Materials used for hybridization were: 2 Eppendorf tube racks, hybridization chambers (2 arrays per chamber), slides, coverslips, and parafilm. About 30 μl of nanopure water was added to each hybridization chamber. Slides and coverslips were cleaned using N2 stream. About 30 μl of hybridization buffer was added to dried probe and vortexed gently for 5 seconds. The probe remained in the dark for 10-15 minutes at room temperature and then was gently vortexed for several seconds and then was flash spun in the microfuge. The probes were boiled or placed in a 95 °C heat block for 5 minutes and centrifuged for 3 min at 20800 x g (14000 rpm, Eppendorf model 5417C). Probes were placed in 70 °C heat block. Each probe remained in this heat block until it was ready for hybridization.
[124] About 25 μl was pipeted onto a coverslip. It is preferable to avoid the material at the bottom of the tube and to avoid generating air bubbles. This may mean leaving about 1 μl remaining in the pipette tip. The slide was gently lowered, face side down, onto the sample so that the coverslip covered that portion of the slide containing the array. Slides were placed in a hybridization chamber (2 per chamber). The lid! of the chamber was wrapped with parafilm and the slides were placed in a 42°C humidity chamber in a 42°C incubator. It is preferable to not let probes or slides sit at room temperature for long periods. The slides were incubated for 18-24 hours.
[125] Post-Hybridization Washing: To obtain only single stranded cDNA probes tightly bound to the sense strand of target cDNA on the array, non-specifically bound cDNA probe should be removed from the array. Removal of nonspecifically bound cDNA probe was accomplished by washing the array and using the following materials: slide holder, glass washing dish, SSC, SDS, and nanopure water. Six glass buffer chambers and glass slide holders were set up with 2X SSC buffer heated to 30-34°C and used to fill up glass dish to 3/4th of volume or enough to submerge the microarrays. The slides were placed in 2X SSC buffer for 2 to 4 minutes while the cover slips fall off. The slides were then moved to 2X SSC, 0.1% SDS and soaked for 5 minutes. The slides were transferred into 0.1X SSC and 0.1% SDS for 5 minutes. Then the slides are transferred to 0.1 X SSC for 5 minutes. The slides, still in the slide carrier, were transferred into nanopure water (18 megaohms) for 1 second. To dry the slides, the stainless steel slide carriers were placed on micro-carrier plates and spun in a centrifuge (Beckman GS-6 or equivalent) for 5 minutes at 1000 rpm.
[126] Scanning slides: The washed and dried hybridized slides were scanned on Axon Instruments Inc. GenePix 4000A MicroArray Scanner and the fluorescent readings from this scanner converted into quantitation files (.gpr) on a computer using GenePix software.
[127] : GeneSpring™ software (Version 4.1 , Silicon Genetics) was used for array normalization and transformation, and for statistical analyses that identified genes whose expression correlated with histopathology scores. Microarray data were loaded into GeneSpring™ software for correlation analysis as GenePix files as above. Specific data loaded into GeneSpring™ software included gene name, GenBank ID control channel mean fluorescence and signal channel mean fluorescence. Expression ratio data (ratio of signal to control fluorescence) were normalized using the 50th percentile of the distribution of genes and control channel. Ratio data were excluded from analysis if the control channel value was <0. For analysis of correlations gene expression ratios were transformed as the log of the ratio.
[128] Histopathology scores for each animal (assigned on a compound-dose basis as indicated in Table 1) were entered with gene expression data by using the GeneSpring™ 'Drawn Gene' function. Correlations between spleen necrosis scores and gene expression were conducted with the distance measures listed below: Pearson positive and negative correlation. Spearman positive and negative correlation. [129] These correlation or similarity measures are standard statistical correlation measures that are described in the GeneSpring Advanced Analysis Techniques Manual (Release Date March 13, 2001, Silicon Genetics). Where both positive and negative correlations were obtained combined positive and negative correlating gene lists were also created.
[130] A Predictive Model based on The Predict Parameter Values tool in
GeneSpring™ software was used for spleen necrosis class prediction. The Predict Parameter Values tool relies on standard statistical procedures that can be implemented in a variety of statistical software packages. In a preferred embodiment, the Predictive Model is implemented in S-PLUS® software. However, other statistical software programs may be used. The following is a summary of the procedure as described in GeneSpring's Advanced Analysis Techniques Manual (Release Date March 13, 2001 , Silicon Genetics) with additional information supplied by Silicon Genetics and a statistical expert.
[131] The first step is variable selection of genes to be used for prediction. This entails taking a single gene and a single class (e.g, spleen necrosis) and creating a contingency table. In the table below, columns 1 through N of the table each represent one possible cutoff point based on the gene expression level (ratio of signal/control) for that class. The number of possible cutoffs is less than or equal to the total number of samples for the class (e.g., A). It is possibly less than the total number, since there may be ties in gene expression level. Hence, N, M, and may or may not be distinct. In the example, an n-class problem is illustrated, where x and y entries are the class counts at that gene expression cutoff level, for that specific gene and class, either above ("a") or below ("b") the cutoff. "Classl" is the set of all samples (above or below) the cutoff for Classl , and "ICIassl" are all those not in Classl (above or below) the cutoff, and similarly for the other classes. The class totals in the training set are the total class marginals used to compute Fisher's exact test.
[132] For a specific gene, and for each class, the best p-value as calculated by Fisher's Exact Test for independence between one of the pair of columns (e.g., 1a and 1b) and the actual class totals (e.g., A) is used to score the gene (-ln(p) = the score) for that class. Thus, there are N (or, M, Q etc.) contingency tables, where the best score of the N tables is used for that class and gene. If there is a wide disparity between the above and below counts in either the a or b column (this is a two-sided Fisher's Exact Test), the smaller the p-value and the higher the score.
[133] The genes per class are rank ordered by the most discriminating (highest) score. The predictivity list is composed of the most discriminating genes per class. Namely, genes are combined that best discriminate class 1 with those that best discriminate class 2 and so on. The genes are selected in rotation of the highest score per class. Duplicate genes are ignored in the rotation and not added to the list, the gene with the next highest score is taken.
[134] The training samples now have only the gene list garnered from the above procedure. As an example, where once the training samples may have had an initial list of 200 genes per sample, they now have only a subset composed of the gene list, say, 60 (the number of predictivity genes specified) that are selected from the initial list by the gene selections procedure. Thus, each sample is a vector of 60 normalized expression ratios. Since the selection of genes is done in rotation, for 2 classes, the list contains 30 genes for class one, and 30 genes for class two. For 3 classes the list contains 20 genes for class one, 20 for class two, and 20 for class three, etc. The matrix below illustrates the basic features of this gene selection process.
Figure imgf000034_0001
Figure imgf000035_0001
[135] After the genes to be used in the training set have been selected, the test set is classified based on the /c-nearest neighbor (knn) voting procedure. Using just those genes in the gene list, for each sample in the test set of samples, the k- nearest neighbors in the training set are found with the Euclidean distance. The class in which each of the /c-nearest neighbors is determined, and the test set sample is assigned to the class with the largest representation in the k nearest neighbors after adjusting for the proportion of classes in the training set.
[136] For example, in a two-class problem, let there be 30 samples of class 1 and 60 samples of class 2 in the training set. With k = 9 say it can be determined that 7 of the nearest neighbors to a sample from the testing set are in class 1. The sample can then be classified as being a member of class 1. If another sample from the test set has a total of 4 nearest neighbors in class 1 , after adjusting for the proportion, this sample would be assigned to class 1 rather than class 2, even though the majority vote suggests assignation to class 2.
[137] The decision threshold is a mechanism to help clearly define the class into which the sample will fall, and can be set to reject classification if the voting is very close or tied. (Thus, k can be even for two-class problems without worrying about the tie problem.) A p-value is calculated for the proportion of neighbors in each class against the proportions found in the training set, again using Fisher's exact test, but now a one-sided test.
[138] For example, let k = 11 , if the proportion of neighbors of class 1 in the test set is 6/11 , and the proportion of class 1 in a 100 sample training set is 0.4, the p- value calculated is 0.29 (half the two-sided test). If the proportion in the training set is 0.1 , the p-value is 0.004. The smaller the p-value the greater the likelihood that the sample from the testing set belongs to that class.
[139] A p-value ratio (P-value) is set as a way of setting the level of confidence in individual sample predictions based on the ratio of p-values for the best class (lowest p-value) versus the second best class (second lowest p-value). For example, if the P-value is set at 0.5 and the ratio of p-values for a particular sample is 0.6, then the predictive model will not make a call for that sample.
[140] After array normalization and transformation into GeneSpring™, the data was exported from GeneSpring into an Excel (Microsoft) spreadsheet using GeneSpring's Copy Annotated Gene List feature. Data were each separated into 5 training and test sets by randomly distributing the compounds into the sets. This was accomplished through the use of a computer program that assigns random numbers to lists of compounds that are negative and positive for histopathology and divides the sorted lists into a specific number of training and test sets. The training and test set assignments are presented in Table 2.
[141] Spleen necrosis classifications were entered for the exported data set as a parameter column. Toxicity, as defined by observation of spleen necrosis at 72 hours after treatment, was entered as spleen necrosis (yes) or no spleen necrosis (no) for each animal in a compound-dose group. Additionally, random histopathology classification was designated for the training and test sets. This was done by randomly assigning the same number of spleen necrosis calls to the individual animals.
[142] An S-PLUS® batch program was used with each of the training and test sets to generate predictions of histopathology classifications of the test sets. The program performs functions similar to those used in the Predictive Model as embodied in GeneSpring's Predict Parameter Values tool, such as Fisher discriminant analysis, /c-nearest neighbor voting, and P-value ratio calculation. A P-value ratio cutoff of 0.5 was used. Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff. The number of genes used to predict was varied starting with one gene and increasing incrementally until all genes in the input gene list were used. For each number of genes the geometric mean for the combination was displayed in the batch program's output file. For each input list and optimal number of predictive genes (lowest number of genes giving a maximum overall percent of correct calls) additional information was recorded that included the list of specific genes in the optimum predictive set.
[143] Expression array data were first examined for the existence of genes whose expression correlated with histopathology scores. Table 1 presents a list of the compounds and dose levels along with the spleen necrosis histopathology classification and histopathology severity scores used for this analysis. For each distance measure the probability was adjusted in increments of 0.05 until at least 50 correlating genes were obtained. Lists of correlating genes were obtained using the distance measures described in Materials and Methods.
[144] The correlating gene lists as well as the entire array gene list were provided as input lists to the S-PLUS® batch program. These lists as well as the entire array gene list were used for each of the five training and test sets defined in Materials and Methods to generate predictions of histopathology classifications of the test sets. Input genes for the S-PLUS® batch program included all 700 genes in the GenePix file (the rat CT Array) which were disclosed in a currently pending application (serial number 10/060,893) filed on January 29, 2002, as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously. To obtain an optimum number of predictive genes, the number of genes used to predict are varied incrementally . starting with 1 gene.
[145] After this was done for all 5 training and test sets, all gene lists were then merged to create one aggregate list of predictive genes. Each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set. The aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 5 training and test sets were used, genes that were predictive in all 5 training and test sets were designated as Combo (combination) 5. Genes that were predictive in only 4 of 5 training and test sets were designated as Combo 4, etc. A list of predictive genes organized by their occurrence in the separate training and test sets is presented in Table 3.
Example 2
[146] Predictive Properties and Evaluation of Spleen Necrosis Predictive Genes from 24 Hour Expression Data provides is described. Materials and Methods: The database used was as described in Example 1.
[147] Array Data, Normalization and Transformation: Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 20 presents 24 hour gene expression data for the predictive genes. These data can be used with a Predictive Model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example.
[148] Class Prediction: The S-PLUS® batch program was used for spleen 'necrosis class prediction. A description of this tool and the statistical procedures used are provided in Example 1.
[149] Training and Test Data Sets: The training and test data sets used are Those described in Table 2 of Example 1.
[150] Spleen Necrosis Toxicology Classification: Spleen necrosis classifications used are described in Table 1 of Example 1. In this analysis randomized classifications (same number of positive and negative spleen necrosis classifications distributed randomly among the samples) were also used.
[151] Prediction Output and Initial Data Processing: Each aggregate list of predictive genes was used for evaluation of predictive performance using the S- PLUS® batch program as described in Example 1. The output of the S-PLUS® batch program included a table of prediction measures described below.
[152] Prediction Measures: Measures of prediction used for these analyses are generally accepted prediction measures for information about actual and predicted classifications done by a classification system (Modern Applied Statistics with S- Plus, W. N. and B. D. Ripley, Springer, 1994, 3rd edition.; Proc. 14th International Conference on Machine Learning, Miroslav Kubat, Stan Matwin, 1997). Results from predictions of a 2-class case can be described as a 2-class matrix:
Figure imgf000039_0001
[153] Class I is defined as "Spleen Necrosis."
[154] Class II is defined as "No Spleen Necrosis"
[155] Standard terms used for prediction for the 2-class case are:
[156] Accuracy is the proportion of total number of predictions that are correct = a+d/a+b+c+c
[157] False positive rate is the proportion of negative cases that are incorrectly classified as positive = b/a+b
[158] False negative rate is the classified as negative = c/c+d proportion of positive cases that are incorrectly
[159] Geometric-mean is the performance measure that takes into account proportion of positive and negative cases (Kubat et al., ibid) = the square root of TP*TN where TP = true positive rate (d/c+d) and TN = true negative rate (a/a+b). In these analyses cases where no prediction was made because the p- value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect.
[160] Random Selected Gene Sets: Subsets of randomly selected genes were prepared from the predictive gene sets to test whether such subsets would have predictive value. Assignments of genes to these subsets are presented in Tables 4-5. Genes were also randomly selected from the list of all genes excluding the 154 twenty-four hour predictive genes (also known as non-predictive genes) by assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes. Assignments of genes to these subsets are presented in Table 6. Genes were randomly selected from the entire array list of genes excluding the Combo All 154 predictive genes by assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
[161] Prediction results for 24 hour expression data using genes identified as predictive are presented in Table 7. Prediction measures are given as means and range of values (in parentheses) for five training/test sets using 24 hour array data and gene lists as presented in Table 5. Unit of prediction was the animal and the predictive classification was for spleen necrosis observed at 72 hours after treatment. ** Standard prediction measures were used as defined in Materials and Methods. These include: Accuracy is the proportion of total number of predictions that are correct = a+d/a+b+c+c False positive rate is the proportion of negative cases that are incorrectly classified as positive = b/a+b False negative rate is the proportion of positive cases that are incorrectly classified as negative = c/c+d Geometric-mean is the performance measure that takes into account proportion of positive and negative cases (Kubat et al., ibid) = the square root of TP*TN where TP = true positive rate (d/c+d) and TN = true negative rate (a/a+b). In these analyses cases where no prediction was made because the p- value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect. Non-calls are counted as incorrect predictions as defined in Materials and Methods.
[162] These data indicate a high accuracy in predicting spleen necrosis. Mean accuracies were approximately 0.80 (79.4% to 81.1 % accuracy) for the entire predictive gene list (Combo All) and the Combo 4 and Combo 3 subset gene lists, and greater than 0.75 (75% accuracy) for the Combo 2 and Combo 1 subset gene lists. Because these predictions were conducted with multiple training/test set combinations it is possible to obtain an indication of the variability in prediction rates and robustness of the prediction capabilities of these gene sets. For the Combo All and Combo 1 - Combo 4 lists the minimum predictive accuracy value for any one training and test set was greater than 0.65 (65%), with most lists giving 0.70 (70%) or better minimum accuracy.
[163] The Geometric Mean (GMM) was used as an indication of predictive performance that includes consideration of the proportion of positive and negative cases for spleen necrosis. All gene sets gave GMM measures > 0.50 (50%), and the Combo 4, Combo 3, and Combo 2 gene sets had GMM measures of approximately 0.70 or greater (69.8% to 72.2%). The GMM measures indicate that the 24 hour gene sets can accurately predict samples with spleen necrosis.
[164] As described in Materials and Methods, in those cases where no prediction was made because the P-value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect. Table 8 shows the level of predictive accuracy of individual genes of Combo 5, Combo 4, and Combo 3 gene lists, for 24 hour spleen necrosis data. * Combo Gene Lists as in Table 3. For Combo lists all genes were used or random subsets as in Table 4 and 5. All-Pred used genes randomly selected from genes that were present on the array but not in the predictive list. ** Overall Accuracy = proportion of the total number of predictions that are correct. Non-calls are counted as incorrect predictions as defined in Materials and Methods. Accuracy was calculated for correct classifications of Spleen Necrosis assigned to the samples and for randomized classifications in the same proportions as the correct classifications. Values presented are the mean accuracy values for 5 training/test sets with minimum and maximum accuracy values.The tables show that overall, individual genes of the Combo groups did not perform as well as the combination as a whole, as the average geometric mean of individual genes versus the entire combo set was 48.2% vs. 58.3% for Combo 5, 66.1 % vs. 72.2% for Combo 4, and 63.9% vs. 70.7% for Combo 3. The table also shows that while many of the individual genes of the Combo groups were predictive (e.g., geometric means as high as 78.5% for individual genes of Combo 4 and 75.8% for Combo 3, with a majority of genes exceeding 60% for the three Combo groups), the geometric mean of individual genes rarely exceeded the geometric mean of the whole combination.
[165] In order to assess the performance of subsets of genes, predictive performance was evaluated for subsets of genes randomly selected from the total combined predictive list (Combo All) and the top Combo sets (as defined in Materials and Methods). Prediction results for 24 hour expression data using randomly selected subsets of genes are presented in Table 9. These data clearly indicate that smaller subsets of the Combo gene lists have predictive power. Table 9 also compares prediction accuracy for correct classification of spleen necrosis and for the same proportion of positive and negative toxicity calls randomly assigned to the samples (random classification). For each gene set or subset predictions were made using the same five training/test sets as for the other prediction analyses. Additionally, sets of genes were randomly chosen from the array which were not identified on the list of 154 predictive genes at 24 hour (Example 1 , Table 3). In Table 3, combination category is the number of training/test set gene list occurrences.
[166] It is clear from these data that the predictions with accurate classification are much better than predictions with randomized classification. This means that the predictive results are not simply due to chance and large data sets but are due to significant, meaningful predictive association between the gene expression of the predictive genes and the spleen necrosis. The accuracy numbers for the gene sets selected from a list of all genes on the array minus the predictive genes tend to be much lower than the Combo predictive lists and the random subsets of these predictive lists. This also verifies the predictive power of the identified predictive genes. The fact that the predictive numbers from these subsets are somewhat higher for accurate than random classification is likely due to some residual predictivity in these genes that is overall not very substantial.
Example 3
[167] Spleen Necrosis Predictive Genes from 6 hour expression data are identified. Compounds and treatments list used to construct the spleen necrosis database are given in Table 1 of Example 1. This table also provides the evaluation of spleen toxicity as observed as necrosis in samples collected 72 hours after treatment. The database is described in detail in Example 1. This Example analyzes expression data from samples collected 6 hours after treatment.
[168] Array Data, Normalization and Transformation: Array data, normalization and transformation procedures used were as described in Example 1.
[169] Correlation with Histopathology Scores: Procedures and methods for obtaining gene lists correlating with histopathology scores were as described in Example 1.
[170] Class Prediction: The S-PLUS® batch program was used for spleen necrosis class prediction. Descriptions of this tool and the statistical procedures used are provided in Example 1.
[171] Training and Test Data Sets: After array normalization and transformation into GeneSpring™, the data was exported out of GeneSpring into an Excel (Microsoft) spreadsheet using GeneSpring's Copy Annotated Gene List feature. Data were each separated into 5 training and test sets by randomly distributing the compounds into the sets. This was accomplished through the use of a computer program that assigns random numbers to lists of compounds that are negative and positive for histopathology and divides the sorted lists into a specific number of training and test sets. The training and test set assignments are presented in Table 10. * For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.). ** Negative= Compounds that did not elicit histopathology (score=1 ) Positive= Compounds that did elicit histopathology (score of 2 or greater).
[172] Spleen Necrosis Toxicity Classification: Spleen necrosis classifications were entered for the exported data set as a parameter column. Toxicity, as defined by observation of spleen necrosis at 72 hours after treatment, was entered as spleen necrosis (yes) or no spleen necrosis (no) for each animal in a compound-dose group. Additionally, random histopathology classification was designated for the training and test sets. This was done by randomly assigning the same number of spleen necrosis calls to the individual animals.
[173] Prediction Output and Initial Data Processing: A S-PLUS® batch program was used with each of the training and test sets to generate predictions of histopathology classifications of the test sets. The program performs functions similar to those used in GeneSpring's Predict Parameter Values tool such as Fisher discriminant analysis, K-nearest neighbor, and P-value ratio calculation. A P-value ratio cutoff of 0.5 was used. Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff. The number of genes used to predict was varied starting with one gene and increasing incrementally until all genes in the input gene list were used. For each number of genes the geometric mean for the combination was displayed in the batch program's output file. For each input list and optimal number of predictive genes (lowest number of genes giving a maximum overall percent of correct calls) additional information was recorded that included the list of specific genes in the optimum predictive set.
[174] Results: Expression array data were first examined for the existence of genes whose expression correlated with histopathology scores. Table 1 in Materials and Methods of Example 1 presents a list of the compounds and dose levels along with the spleen necrosis histopathology classification and histopathology severity scores used for this analysis. For each distance measure the probability was adjusted in increments of 0.05 until at least 50 correlating genes were obtained. Lists of correlating genes were obtained using the distance measures described in Materials and Methods.
[175] The correlating gene lists as well as the entire array gene list were provided as input lists to the S-PLUS® batch program. These lists as well as the entire array gene list were used for each of the five training and test sets defined in Materials and Methods to generate predictions of histopathology classifications of the test sets. Input genes for the S-PLUS® batch program included all 700 genes in the GenePix file (the rat CT Array) which were disclosed in a currently pending application (serial number 10/060,893) filed on January 29, 2002, as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously. The number of genes used to predict are varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. The specified number of predictive genes was varied to obtain an optimum number of predictive genes.
[176] After this was done for all 5 training and test sets, all gene lists were then merged to create one aggregate list of predictive genes. Each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set. The aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 5 training and test sets were used, genes that were predictive in all 5 training and test sets were designated as Combo (combination) 5. Genes that were predictive in only 4 of 5 training and test sets were designated as Combo 4, etc.
[177] A list of predictive genes organized by their occurrence in the separate training and test sets are presented in Table 11.Symbol " * " denotes a combination category is the number of training/test set gene list occurrences.
Example 4
[178] Materials and Methods are disclosed for obtaining predictive properties and evaluation of predictive genes from 6 hour expression data. The database used was as described in Example 1.
[179] Array Data, Normalization and Transformation: Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 19 lists 6 hour gene expression data for the predictive genes. These data can be used with a Predictive Model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example.
[180] Class Prediction: The S-PLUS® batch program was used for spleen necrosis class prediction. Descriptions of this tool and the statistical procedures used are provided in Example 1. [181] Training and Test Data Sets: Data were each separated into 5 training and test sets by randomly distributing the compounds into the sets. This was accomplished through the use of a computer program that assigns random numbers to lists of compounds that are negative and positive for histopathology and divides the sorted lists into a specific number of training and test sets. The training and test set assignments are presented in Table 10.
[182] Spleen Necrosis Toxicology Classification: Spleen necrosis classifications used are described in Table 1 of Example 1. In this analysis randomized classifications (same number of spleen necrosis and no spleen necrosis classifications distributed randomly among the samples) were also used.
[183] Prediction Output and Initial Data Processing: Each aggregate list of predictive genes was used for evaluation of predictive performance using the S- PLUS® batch program as described in Example 1. The output of the S-PLUS® batch program included a table of prediction measures described below.
[184] Prediction Measures: Prediction measures such as accuracy and geometric mean were calculated as described in Example 2. Prediction results for 6 hour expression data using genes identified as predictive are presented in Table 12 where comparison of predictive performance for correct and random classifications are shown. The symbo " * " denotes combo Gene Lists as in Table 11. The symbol "**" denotes OverallAccuracy = proportion of the total number of predictions that are correct. Non-calls are counted as incorrect predictions as defined in Materials and Methods. Accuracy was calculated for correct classifications of Spleen Necrosis assigned to the samples and for randomized classifications in the same proportions as the correct classifications. Values presented are the mean accuracy values for 5 training/test sets with minimum and maximum accuracy values.
[185] It is clear from these data that the predictions with accurate classification are much better than predictions with randomized classification. This means that the predictive results are not simply due to chance and large data sets but are due to significant, meaningful predictive association between the gene expression of the predictive genes and spleen necrosis. Example 5
[186] Materials and Methods are disclosed for discovery of Spleen Necrosis Predictive Genes from 72 hour expression data. Database - Compounds and Spleen Necrosis: Compounds and treatments list used to construct the spleen necrosis database are given in Table 1 of Example 1. This table also provides the evaluation of the spleen necrosis observed in samples collected 72 hours after treatment. The Phase-1 Database is described in detail in Example 1. This Example analyzes expression data from samples collected 72 hours after treatment.
[187] Array Data, Normalization and Transformation: Array data, normalization and transformation procedures used were as described in Example 1.
[188] Correlation with Histopathology Scores: Procedures and methods for obtaining gene lists correlating with histopathology scores were as described in Example 1 with scores as in Example 1 , Table 1.
[189] Class Prediction: The S-PLUS® batch program was used for spleen necrosis class prediction. Descriptions of this tool and the statistical procedures used are provided in Example 1. Training and Test Data Sets: After array normalization and transformation into GeneSpring™, the data was exported out of GeneSpring into an Excel (Microsoft) spreadsheet using GeneSpring's Copy Annotated Gene List feature. Data were each separated into 5 training and test sets by randomly distributing the compounds into the sets. This was accomplished through the use of a computer program that assigns random numbers to lists of compounds that are negative and positive for histopathology and divides the sorted lists into a specific number of training and test sets. The training and test set assignments are presented in Table 13. * For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.). ** Negative= Compounds that did not elicit histopathology (score=1). Positive= Compounds that did elicit histopathology (score of 2 or greater).
[190] Spleen Necrosis Toxicology Classification: Spleen necrosis classifications were entered for the exported data set as a parameter column. Toxicity, as defined by observation of spleen necrosis at 72 hours after treatment, was entered as spleen necrosis (yes) or no spleen necrosis (no) for each animal in a compound-dose group. Additionally, random histopathology classification was designated for the training and test sets. This was done by randomly assigning the same number of spleen necrosis calls to the individual animals.
Prediction Output and Initial Data Processing.
[191] A S-PLUS® batch program was used with each of the training and test sets to generate predictions of histopathology classifications of the test sets. The program performs functions similar to those used in GeneSpring's Predict Parameter Values tool such as Fisher discriminant analysis, /c-nearest neighbor, and P-value ratio calculation. A P-value ratio cutoff of 0.5 was used. Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff. The number of genes used to predict was varied starting with one gene and increasing incrementally until all genes in the input gene list were used. For each number of genes the geometric mean for the combination was displayed in the batch program's output file. For each input list and optimal number of predictive genes (lowest number of genes giving a maximum overall percent of correct calls) additional information was recorded that included the list of specific genes in the optimum predictive set.
[192] Results: Expression array data were first examined for the existence of genes whose expression correlated with histopathology scores. Table 1 in Materials and Methods of Example 1 presents a list of the compounds and dose levels along with the spleen necrosis histopathology classification and histopathology severity scores used for this analysis. For each distance measure the probability was adjusted in increments of 0.05 until at least 50 correlating genes were obtained. Lists of correlating genes were obtained using the distance measures described in Materials and Methods.
[193] The correlating gene lists as well as the entire array gene list were provided as input lists to the S-PLUS® batch program (described in Materials and Methods) that employs a /c-nearest neighbor (knn) predictive model. These lists as well as the entire array gene list were used for each of the five training and test sets defined in Materials and Methods to generate predictions of histopathology classifications of the test sets. Input genes for the S-PLUS® batch program included all 700 genes in the GenePix file (the Rat CT Array) as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously. The number of predictive genes in the gene list was varied to obtain an optimum number of predictive genes.
[194] After this was done for all 5 training and test sets, all gene lists were then merged to create one aggregate list of predictive genes. Each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set. The aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 5 training and test sets were used, genes that were predictive in all 5 training and test sets were designated as Combo (combination) 5. Genes that were predictive in only 4 of 5 training and test sets were designated as Combo 4, etc.
[195] A list of predictive genes organized by their occurrence in the separate training and test sets are presented in Table 14. Combination category is the number of training/test set gene list occurrences.
Example 6
[196] Materials and Methods for identifying predictive genes for Spleen necrosis from 72 hour expression data.
[197] Database: The database used was as described in Example 1.
[198] Array Data, Normalization and Transformation: Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 21 presents 72 hour gene expression data for the predictive genes. These data can be used with a Prediction Model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example.
[199] Class Prediction: The S-PLUS® batch program was used for spleen necrosis class prediction. Descriptions of this tool and the statistical procedures used are provided in Example 1.
[200] Training and Test Data Sets: The training and test data sets used are those described in the table of Example 5.
[201] Spleen Necrosis Toxicology Classification: Spleen necrosis classifications used are described in Table 1 of Example 1. In this analysis randomized classifications (same number of spleen necrosis classifications distributed randomly among the samples) were also used.
[202] Prediction Output and Initial Data Processing: Each aggregate list of predictive genes was used for evaluation of predictive performance using the S- PLUS® batch program as described in Example 1. The output of the S-PLUS® batch program included a table of prediction measures described below.
[203] Prediction Measures: Prediction measures such as accuracy and geometric mean were calculated as Described in Example 2.
[204] Results: Prediction results for 72 hour expression data using genes identified as predictive are presented in Table 15 in which comparison of predictive performance for correct and random classification are shown. The symbol "*"denotes Combo Gene Lists as in Table 14. The symbol "**" denotes overall Accuracy = proportion of the total number of predictions that are correct. Non- calls are counted as incorrect predictions as defined in Materials and Methods. Accuracy was calculated for correct classifications of Spleen Necrosis assigned to the samples and for randomized classifications in the same proportions as the correct classifications. Values presented are the mean accuracy values for 5 training/test sets with minimum and maximum accuracy values.
[205] It is clear from these data that the predictions with accurate classification are much better than predictions with randomized classification. This means that the predictive results are not simply due to chance and large data sets but are due to significant, meaningful predictive association between the gene expression of the predictive genes and spleen necrosis. Example 7
[206] Predictive Modeling: The predictive task with the spleen necrosis gene expression data is a 2-class classification problem, where the 2 classes of possible responses are defined as spleen necrosis or no spleen necrosis. This is an uneven class problem in that the class of negative responses is roughly 75 percent of the data or more in the database tested. A discrimination function can be used to classify a training set. This function can be cross-validated with a testing set, often repeatedly to quantify the mean and variation of the classification error. There are numerous common discrimination functions, and a comparative study of the performance of these functions is useful in determining the best classifier. Additional measures can then be used to compare the performance of the classifiers. Since the classes are of significantly uneven sizes, use a geometric mean measure (GMM) can be used to compare models, namely, the square root of the product of the true positives and the true negatives.
[207] Common discrimination methods are Fisher's linear discriminant, quadratic discriminant (mahalanobis distance), /c-nearest neighbors (knn), logistic discriminant (MacLachlan, "Discriminant Analysis and Statistical Pattern Recognition", Wiley Series in Probability and Mathematical Statistics, 1992), classification trees (or more generally known as recursive partitioning) (Breiman et al., "Classification and Regression Trees", Chapman & Hall, 1984; Clark and Pregibon in "Tree-Based Models" (J.M. Chambers and T.J. Hastie, eds.) Chp. 9, Chapman & Hall Computer Science Series, 1993; Quinlan and Kaufman, "C4.5: Programs for Machine Learning", 1988), and neural network classifiers (Ripley, "Pattern Recognition and Neural Networks", Cambridge University Press, 1996). Most are formula-based such as linear and quadratic discriminant, whereas others are rule-based, such as recursive partitioning, or algorithmically based, such as knn. knn is also database dependent in that a database containing training set is needed to perform nearest neighbor search and classification.
[208] Classifier Models: A variety of common classification techniques are available. A simple hybrid classifier could be designed and tested, using the knn results, to transform the knn model into a database independent model. This model is termed a centroid model. The centroid model uses the correctly identified test data results from knn and locates a centroid of the subset of k samples that are of the same class for each correctly identified test sample. The centroid is assigned the correct class, and with new test data, a sample is assigned the class of its nearest centroid.
[209] In addition to the knn and centroid models described above, tree, centroid, logistic, and neural network models could also be employed. The neural network is a simple, feed-forward network, allowing skip layers, and with an entropy fitting criterion.
[210] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes to the same extent as if each individual publication, patent or patent application were specifically and individually indicated to be so incorporated by reference.
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Table 2 Distribution of Compounds* in Individual Training and Test Sets for 24h Spleen Necrosis Data
Training and Test Set 1
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Training and Test Set 2
Figure imgf000057_0002
Figure imgf000058_0001
Figure imgf000059_0001
Training and Test Set 3
Figure imgf000059_0002
Figure imgf000060_0001
Figure imgf000061_0001
Training and Test Set 4
Figure imgf000061_0002
Figure imgf000062_0001
Figure imgf000063_0001
Training and Test Set 5'
Figure imgf000063_0002
Figure imgf000064_0001
Figure imgf000065_0001
Table 3 Spleen Necrosis Predictive Genes for 24 Hour Expression Data
Figure imgf000065_0002
Figure imgf000066_0001
Figure imgf000067_0001
Figure imgf000068_0001
Figure imgf000069_0001
Figure imgf000070_0001
Figure imgf000071_0001
Table 4 Randomly Selected Gene Subsets from 24 Hour Combo All (154 Genes)*
Figure imgf000073_0001
Figure imgf000073_0002
Figure imgf000073_0003
Figure imgf000074_0001
Table 5 Randomly Selected Gene Subsets from 24 Hour Combos 543 Combined
Gene Set (23 Genes)*
Figure imgf000074_0002
Figure imgf000075_0001
Figure imgf000075_0002
Figure imgf000075_0003
Figure imgf000076_0001
Table 6 Randomly Selected Gene Subsets from Array Genes Excluding Combo
All Set*
Figure imgf000077_0001
Figure imgf000077_0002
Figure imgf000078_0001
Table 7 Spleen Necrosis Individual Sample Prediction Values for 24 Hour Data Predictive Genes (Combined List and Subsets)
Figure imgf000079_0001
Table 3 Individual Gene Predictions: Combos 5, 4, and 3
Figure imgf000079_0002
Figure imgf000080_0001
7δ- Table 9 Comparison of Predictivity for True Spleen Necrosis Classification and Random Classification Using Combo Gene Sets and Random Subsets and 24h
Figure imgf000081_0001
data Table 10 Distribution of Compounds* in Individual Training and Test Sets for 6 Hour Spleen Necrosis Data Training and Test Set 1
Figure imgf000082_0001
Figure imgf000083_0001
Figure imgf000084_0001
Training and Test Set 2
Figure imgf000084_0002
Figure imgf000085_0001
33
Figure imgf000086_0001
δ4 ANIT 60mgkg
Training and Test Set 3
Figure imgf000087_0001
-δδ
Figure imgf000088_0001
36-
Figure imgf000089_0001
Training and Test Set 4
Figure imgf000090_0001
Figure imgf000091_0001
Figure imgf000092_0001
Training and Test Set δ
Figure imgf000092_0002
Figure imgf000093_0001
Figure imgf000094_0001
Table 11 List of genes whose expression at 6 hours is predictive of Spleen
Necrosis at 72 hours
Figure imgf000095_0001
Figure imgf000096_0001
Figure imgf000097_0001
Figure imgf000098_0001
Figure imgf000099_0001
Figure imgf000100_0001
Figure imgf000101_0001
Table 12 Comparison of Predictivity for True Spleen Necrosis Classification and
Figure imgf000102_0001
Table 13 Distribution of Compounds* in Individual Training and Test Sets for 72 Hour Spleen Necrosis Data
Training and Test Set 1
Figure imgf000103_0001
Figure imgf000104_0001
Figure imgf000105_0001
Training and Test Set 2
Figure imgf000105_0002
Figure imgf000106_0001
Figure imgf000107_0001
Training and Test Set 3
Figure imgf000107_0002
Figure imgf000108_0001
Figure imgf000109_0001
Training and Test Set 4
Figure imgf000110_0001
Figure imgf000111_0001
Training and Test Set δ
Figure imgf000112_0001
Figure imgf000113_0001
112 Table 14 List of genes whose expression at 72 hours is predictive of Spleen Necrosis at 72 hours
Figure imgf000115_0001
Figure imgf000116_0001
Figure imgf000117_0001
Figure imgf000118_0001
Figure imgf000119_0001
Table 1δ Comparison of Predictivity for True Spleen Necrosis Classification and Random Classification Using Combo Gene Sets and 72h data
Figure imgf000120_0001
Table 16 RCT genes (ESTs) Predictive for Spleen Necrosis: Best Homology Matches
Figure imgf000121_0001
Figure imgf000122_0001
Figure imgf000123_0001
Figure imgf000124_0001
Figure imgf000125_0001
Figure imgf000126_0001
Figure imgf000127_0001
T-cell cyclophilin Thioredoxin-1 (Trx1) Tissue factor

Claims

What is claimed is:
1. A composition comprising a plurality of cDNAs for use in detecting the altered expression of genes in a toxic response of the spleen, wherein said plurality of cDNAs comprises SEQ ID NOs: 1-306 or the complete complements thereof.
2. The composition of claim 1 , wherein said cDNAs are immobilized on a substrate.
3. The composition of claim 1 , wherein said cDNAs are hybridizable elements on a microarray.
4. A method for monitoring the treatment of compound toxicity in a sample, said method comprising: obtaining nucleic acids from a sample; contacting the nucleic acids of the sample with an array comprising the plurality of cDNAs of claim 1 under conditions to form one or more hybridization complexes; detecting said hybridization complexes; and comparing the levels of the hybridization complexes detected with the level of hybridization complexes detected in a non-toxic sample, wherein the altered level of hybridization complexes detected compared with the level of hybridization complexes of a non-toxic sample correlates with the presence of a spleen toxic condition.
5. The method of claim 4, wherein said cDNAs are immobilized on a substrate.
6. The method of claim 4, wherein said cDNAs are hybridizable elements in a microarray.
7. Method of predicting the spleen toxicity in an individual to an agent, comprising the steps of: obtaining a biological sample from an individual treated with the agent; contacting the nucleic acids of the sample with an array comprising the plurality of cDNAs of claim 1 under conditions to form one or more hybridization complexes; and detecting said hybridization complexes; and comparing the levels of the hybridization complexes detected with the level of hybridization complexes detected in a non-toxic sample, wherein the altered level of hybridization complexes detected compared with the level of hybridization complexes of a non-toxic sample correlates with the presence of a spleen toxicity condition.
δ. A method of predicting the spleen toxicity of an agent using an in vitro system, comprising the steps of: obtaining a biological sample from in vitro cultured cells or explants treated with the agent; contacting the nucleic acids of the sample with an array comprising the plurality of cDNAs of claim 1 under conditions to form one or more hybridization complexes; and comparing the test expression profile with a set of reference expression profiles in a Predictive Model to determine whether the agent will induce spleen toxicity.
9. A computer program product for predicting spleen toxicity from an expression profile of nucleic acids from a sample under test, comprising: a computer readable medium bearing: an encrypted training data set; encrypted lists of genes selected from the plurality of cDNAs of claim 1; and a predictive model for causing a general purpose computer to predict the spleen toxicity of the sample based upon the training data set, the list of genes selected from the plurality of cDNAs, and the expression profile of nucleic acids from the sample.
10. An integrated system for predicting spleen toxicity, comprising: means for measuring gene expression profiles of spleen predictive genes from samples exposed to the test agent; and a computer system operably linked to said means that is capable of implementing a predictive model.
PCT/US2004/008371 2003-03-17 2004-03-17 Spleen necrosis predictive genes WO2004083402A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US45544303P 2003-03-17 2003-03-17
US60/455,443 2003-03-17

Publications (2)

Publication Number Publication Date
WO2004083402A2 true WO2004083402A2 (en) 2004-09-30
WO2004083402A3 WO2004083402A3 (en) 2005-05-19

Family

ID=33030000

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/008371 WO2004083402A2 (en) 2003-03-17 2004-03-17 Spleen necrosis predictive genes

Country Status (1)

Country Link
WO (1) WO2004083402A2 (en)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BUTTE: 'The use and analysis of microarray data' NATURE REVIEWS DRUG DISCOVERY vol. 1, December 2002, pages 951 - 960, XP002985915 *
CHIN ET AL: 'Application of DNA Microarrays in Pharmacogenomics and Toxicogenomics' PHARMACEUTICAL RESEARCH vol. 19, no. 12, December 2000, pages 1773 - 1778, XP002985914 *

Also Published As

Publication number Publication date
WO2004083402A3 (en) 2005-05-19

Similar Documents

Publication Publication Date Title
CA2897828C (en) Methods for identifying, diagnosing, and predicting survival of lymphomas
Redman et al. Development and evaluation of an Arabidopsis whole genome Affymetrix probe array
US8263759B2 (en) Sets of probes and primers for the diagnosis of select cancers
IL264072A (en) Methods to predict clinical outcome of cancer
US20050095592A1 (en) Identification of ovarian cancer tumor markers and therapeutic targets
US20140141435A1 (en) Diagnosis of sepsis
US20090203588A1 (en) Outcome prediction and risk classification in childhood leukemia
US20060057066A1 (en) Reagent sets and gene signatures for renal tubule injury
US20070059685A1 (en) Method for producing improved results for applications which directly or indirectly utilize gene expression assay results
WO2003083140A2 (en) Classification and prognosis prediction of acute lymphoblasstic leukemia by gene expression profiling
WO2006135904A2 (en) Method for producing improved results for applications which directly or indirectly utilize gene expression assay results
US20120142544A1 (en) Diagnostic transcriptomic biomarkers in inflammatory cardiomyopathies
US20120185174A1 (en) Prognostic Signature for Colorectal Cancer Recurrance
EP1506395A2 (en) Liver inflammation predictive genes
WO2003085083A2 (en) Liver necrosis predictive genes
US20090215033A1 (en) Prediction of Clinical Outcome Using Gene Expression Profiling and Artificial Neural Networks for Patients with Neuroblastoma
CA2531091A1 (en) Genes regulated in ovarian cancer as prognostic and therapeutic targets
WO2010000848A1 (en) In vitro diagnosis/prognosis method and kit for assessment of tolerance in liver transplantation
WO2004083402A2 (en) Spleen necrosis predictive genes
Walder et al. Obesity and diabetes gene discovery approaches
WO2003100030A2 (en) Kidney toxicity predictive genes
Pylatuik et al. Comparison of transcript profiling on Arabidopsis microarray platform technologies
US20060281091A1 (en) Genes regulated in ovarian cancer a s prognostic and therapeutic targets
KR102193657B1 (en) SNP markers for diagnosing Taeeumin of sasang constitution and use thereof
AU2007277142B2 (en) Methods for identifying, diagnosing, and predicting survival of lymphomas

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION PURSUANT TO RULE 69 EPC (EPO FORM 1205A OF 230106)

DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
122 Ep: pct application non-entry in european phase