EP1697873A4 - Procedes de modelisation de toxicologie moleculaire - Google Patents

Procedes de modelisation de toxicologie moleculaire

Info

Publication number
EP1697873A4
EP1697873A4 EP04812167A EP04812167A EP1697873A4 EP 1697873 A4 EP1697873 A4 EP 1697873A4 EP 04812167 A EP04812167 A EP 04812167A EP 04812167 A EP04812167 A EP 04812167A EP 1697873 A4 EP1697873 A4 EP 1697873A4
Authority
EP
European Patent Office
Prior art keywords
gene
score
toxicity
data
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP04812167A
Other languages
German (de)
English (en)
Other versions
EP1697873A2 (fr
Inventor
James C Diggans
Michael Elashoff
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocimum Biosolutions Inc
Original Assignee
Ore Pharmaceuticals Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2003/037556 external-priority patent/WO2004048598A2/fr
Application filed by Ore Pharmaceuticals Inc filed Critical Ore Pharmaceuticals Inc
Publication of EP1697873A2 publication Critical patent/EP1697873A2/fr
Publication of EP1697873A4 publication Critical patent/EP1697873A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • multicellular screening systems may be preferred or required to detect the toxic effects of compounds.
  • the use of multicellular organisms as toxicology screening tools has been significantly hampered, however, by the lack of convenient screening mechanisms or endpoints, such as those available in yeast or bacterial systems. Additionally, certain previous attempts to produce toxicology prediction systems have failed to provide the necessary modeling data and statistical information to accurately predict toxic responses (e.g., WO 00/12760, WO 00/47761, WO 00/63435, WO 01/32928, and WO 01/38579).
  • the present invention is based, in part, on the elucidation of the global changes in gene expression in animal tissues or cells, such as liver or kidney tissue or cells, exposed to known toxins, in particular hepatotoxins or renal toxins, as compared to unexposed tissues or cells, as well as the identification of individual genes that are differentially expressed upon toxin exposure.
  • the invention includes methods of predicting at least one toxic effect of a test agent by comparing gene expression information from agent-exposed samples to a database of gene expression information from toxin-exposed and control samples (vehicle-exposed samples or samples exposed to a non-toxic compound or low levels of a toxic compound).
  • These methods comprise providing or generating quantitative gene expression information from the samples, converting the gene expression information to matrices of fold-change values by a robust multi-array average (RMA) algorithm, generating
  • 1-WA/22981 50 .2 a gene regulation score for each gene that is differentially expressed upon exposure to the test agent by a partial least squares (PLS) algorithm, and calculating a sample prediction score for the test agent. This sample prediction score is then compared to a reference prediction score for one or more toxicity models. If the sample prediction score is equal to or greater than the reference prediction score, the test agent can be predicted to have at least one toxic effect or to produce at least one pathology corresponding to the toxicity model to which the test agent's prediction score is compared.
  • PLS partial least squares
  • the invention includes methods of creating a toxicology model. These methods comprise providing or generating quantitative nucleic acid hybridization data for a plurality of genes from at least one cell or tissue sample exposed to a toxin and at least one cell or tissue sample exposed to the toxin vehicle, converting the hybridization data from at least one gene to a gene expression measure, such as fold-change value, by a robust multi- array average (RMA) algorithm, generating a gene regulation score from a gene expression measure for at least one gene by a partial least squares (PLS) algorithm, and generating a toxicity reference prediction score for the toxin, thereby creating a toxicology model.
  • RMA multi- array average
  • PLS partial least squares
  • the invention includes a computer system comprising a computer readable medium containing a toxicity model for predicting the toxicity of a test agent and software that allows a user to predict at least one toxic effect of a test agent by comparing a sample prediction score for the test agent to a toxicity reference prediction score for the toxicity model.
  • the gene expression information from test agent- exposed tissues or cells may be prepared as text or binary files, such as CEL files, and transmitted via the Internet for analysis and comparisons to the toxicity models stored on a remote, central server. After processing, the user that sent the text files receives a report indicating the toxicity or non-toxicity of the test agent.
  • the user may download one or more toxicity models from the remote, central server, as well as software for manipulating the user's data and the toxicity models, to a local server.
  • Gene expression information from test agent-exposed tissues or cells may then be prepared as text files, such as CEL files, and analyzed and compared at the user's site to the toxicity models stored on the local server.
  • the software After processing, the software generates a report indicating the toxicity or non-toxicity of the test agent.
  • Table 1 Table 1 provides the GLGC identifier (fragment names from Table 2) in relation to the SEQ ID NO. and GenBank Accession number for each of the gene fragments listed in Table 2 (all of which are herein incorporated by reference and replication in the attached sequence listing). The gene names and Unigene cluster titles are also included.
  • Table 2 Table 2 presents the PLS scores (weighted gene index scores) from an exemplary kidney general toxicity model.
  • nucleic acid hybridization data refers to any data derived from the hybridization of a sample of nucleic acids to a one or more of a series of reference nucleic acids. Such reference nucleic acids may be in the form of probes on a microarray or set of beads or may be in the form of primers that are used in polymerization reactions, such as PCR amplification, to detect hybridization of the primers to the sample nucleic acids.
  • Nucleic hybridization data may be in the form of numerical representations of the hybridization and may be derived from quantitative, semi-quantitative or non-quantitative analysis techniques or technology platforms. Nucleic acid hybridization data includes, but is not limited to gene expression data.
  • the data may be in any form, including florescence data or measurements of florescence probe intensities from a microarray or other hybridization technology platform.
  • the nucleic acid hybridization data may be raw data or may be normalized to correct for, or take into account, background or raw noise values, including background generated by microarray high/low intensity spots, scratches, high regional or overall background and raw noise generated by scanner electrical noise and sample quality fluctuation.
  • cell or tissue samples refers to one or more samples comprising cell or tissue from an animal or other organism, including laboratory animals such as rats or mice.
  • the cell or tissue sample may comprise a mixed population of cells or tissues or may be substantially a single cell or tissue type, such as hepatocytes or liver tissue.
  • Cell or tissue samples as used herein may also be in vitro grown cells or tissue, such as primary cell cultures, immortalized cell cultures, cultured hepatocytes, cultured liver tissue, etc.. Cells or
  • tissue may be derived from any organ, including but not limited to, liver, kidney, cardiac, muscle (skeletal or cardiac) or brain.
  • test agent refers to an agent, compound or composition that is being tested or analyzed in a method of the invention.
  • a test agent may be a pharmaceutical candidate for which toxicology data is desired.
  • test agent vehicle refers to the diluent or carrier in which the test agent is dissolved, suspended in or administered in, to an animal, organism or cells.
  • toxin vehicle refers to the diluent or carrier in which a toxin is dissolved, suspended in or administered in, to an animal, organism or cells.
  • a “gene expression measure” refers to any numerical representation of the expression level of a gene or gene fragment in a cell or tissue sample. A “gene expression measure” includes, but is not limited to, a fold-change value.
  • At least one gene refers to a nucleic acid molecule detected by the methods of the invention in a sample.
  • a “gene” includes any species of nucleic acid that is detectable by hybridization to a probe in a microarray, such as the "genes" of Table 1.
  • at least one gene includes a "plurality of genes.”
  • fold-change value refers to a numerical representation of the expression level of a gene, genes or gene fragments between experimental paradigms, such as a test or treated cell or tissue sample, compared to any standard or control.
  • a fold-change value may be presented as microarray-derived florescence or probe intensities for a gene or genes from a test cell or tissue sample compared to a control, such as an unexposed cell or tissue sample or a vehicle-exposed cell or tissue sample.
  • An RMA fold- change value as described herein is a non-limiting example of a fold-change value calculated by methods of the invention.
  • sample regulation score refers to a quantitative measure of gene expression for a gene or gene fragment as derived from a weighted index score or PLS score for each gene and the fold-change value from treated vs. control samples.
  • sample prediction score refers to a numerical score produced via methods of the invention as herein described. For instance, a “sample prediction score” may
  • 1-WA/22981 50 .2 be calculated using the PLS weight or PLS score for at least one gene in a gene expression profile generated from the sample and the RMA fold-change value for that same gene.
  • sample prediction score is derived from summing the individual gene regulation scores calculated for a given sample.
  • toxicity reference prediction score refers to a numerical score generated from a toxicity model that can be used as a cut-off score to predict at least one toxic effect of a test agent. For instance, a sample prediction score can be compared to a toxicity reference prediction score to determine if the sample score is above or below the toxicity reference prediction score. Sample prediction scores falling below the value of a toxicity reference prediction score are scored as not exhibiting at least one toxic effect and sample prediction scores above the value if a toxicity reference prediction score are scored as exhibiting at least one toxic effect.
  • a log scale linear additive model includes any log-liner model such as log scale robust multi-array average or RMA (Irizarry et al, Nucleic Acids Research 31(4) el 5 (2003).
  • remote connection refers to a connection to a server by a means other than a direct hard-wired connection. This term includes, but is not limited to, connection to a server through a dial-up line, broadband connection, Wi-Fi connection, or through the Internet.
  • a "CEL file” refers to a file that contains the average probe intensities associated with a coordinate position, cell or feature on a microarray (such information provided by the CDF or 1LQ file). See Affymetrix GeneChip ® Expression Analysis
  • a “gene expression profile” comprises any quantitative representation of the expression of at least one mRNA species in a cell sample or population and includes profiles made by various methods such as differential display, PCR, microarray and other hybridization analysis, etc.
  • Methods of Generating Toxicity Models [0029] To evaluate and identify gene expression changes that are predictive of toxicity, studies using selected compounds with well characterized toxicity may be used to build a model or database of the present invention. Methods of the present invention include an
  • cell and tissue samples are analyzed after exposure to compounds known to exhibit at least one toxic effect. Low doses of these compounds, or the vehicles in which they were prepared, are used as negative controls. Compounds that are known not to exhibit at least one toxic effect may also be used as negative controls.
  • a toxicity study or "tox study” comprises a set of cell or tissue samples that have been exposed to one or more toxins and may include matched samples exposed to the toxin vehicle or a low, non-toxic, dose of the toxin.
  • the cell or tissue samples may be exposed to the toxin and control treatments in vivo or in vitro.
  • toxin and control exposure to the cell or tissue samples may take place by administering an appropriate dose to an animal model, such as a laboratory rat.
  • toxin and control exposure to the cell or tissue samples may take place by administering an appropriate dose to a sample of in vitro grown cells or tissue, such as primary rat or human hepatocytes.
  • samples are typically organized into cohorts by test compound, time (for instance, time from initial test compound dosage to time at which rats are sacrificed), and dose (amount of test compound administered). All cohorts in a tox study typically share the same vehicle control.
  • a cohort may be a set of samples from rats that were treated with acyclovir for 6 hours at a high dosage (100 mg/kg).
  • a time- matched vehicle cohort is a set of samples that serve as controls for treated animals within a tox study, e.g. , for 6-hour acyclovir-treated high dose samples the time-matched vehicle cohort would be the 6-hour vehicle-treated samples with that study.
  • a toxicity database or "tox database” is a set of tox studies that alone or in combination comprise a reference database.
  • a reference database may include data from rat tissue and cell samples from rats that were treated with different test compounds at different dosages and exposed to the test compounds for varying lengths of time.
  • RMA or robust multi-array average, is an algorithm that converts raw fluorescence intensities, such as those derived from hybridization of sample nucleic acids to an Affymetrix GeneChip microarray, into expression values, one value for each gene fragment on a chip (Irizarry et al. (2003), Nucleic Acids Res. 3 l(4):el5, 8 pp.; and Irizarry et al. (2003) "Exploration, normalization, and summaries of high density oligonucleotide array probe level
  • RMA produces values on a log2 scale, typically between 4 and 12, for genes that are expressed significantly above or below control levels. These RMA values can be positive or negative and are centered around zero for a fold-change of about 1.
  • a matrix of gene expression values generated by RMA can be subjected to PLS to produce a model for prediction of toxic responses, e.g., a model for predicting liver or kidney toxicity.
  • the model is validated by techniques known to those skilled in the art.
  • a cross-validation technique is used. In such a technique, the data is randomly broken into training and test sets several times until model success rate is determined. Most preferably, such technique uses 2/3 / 1/3 cross-validation, where 1/3 of the data is dropped and the other 2/3 is used to rebuild the model.
  • PLS Partial Least Squares
  • PLS is a modeling algorithm that takes as inputs a matrix of predictors and a vector of supervised scores to generate a set of prediction weights for each of the input predictors (Nguyen et al. (2002), Bioinformatics 18:39-50). These prediction weights are then used to calculate a gene regulation score to indicate the ability of each analyzed gene to predict a toxic response. As described in the examples, the gene regulation scores may then be used to calculate a toxicity reference prediction score. [0035] From the nucleic acid hybridization data, a gene expression measure is calculated for one or more genes whose level of expression is detected in the nucleic acid hybridization value.
  • the gene expression measure may comprise an RMA fold-change value.
  • the toxicity reference score ⁇ w ⁇ R FC ' .
  • "i" is the index number for each gene in a gene expression profile to be evaluated, "w " is the PLS weight (or PLS score, see Table 2) for each gene.
  • R FC "' is the RMA fold-change value for the i th gene, as determined from a normalized RMA matrix of gene expression data from the sample (described above).
  • the PLS weight multiplied by the RMA fold-change value gives a gene regulation score for each gene, and the regulation scores for all the individual genes are added to give a toxicity reference prediction score for a sample or cohort of sample.
  • a toxicity reference prediction score can be calculated from at least one gene regulation score, or at least about 5, 10, 25, 50, 100, 500 or about 1,000 or more gene regulation scores.
  • a toxicology or toxicity model of the invention is prepared or created by the steps of (a) providing nucleic acid hybridization data for a plurality of genes from at least one cell or tissue sample exposed to a toxin and at least one cell or tissue sample exposed to the toxin vehicle; (b) converting the hybridization data from at least
  • the model is validated by techniques known to those skilled in the art.
  • a cross-validation technique is used. In such a technique, the data is randomly broken into training and test sets several times until an acceptable model success rate is determined. Most preferably, such technique uses 2/3 / 1/3 cross-validation, where 1/3 of the data is dropped and the other 2/3 is used to rebuild the model.
  • the gene regulation scores and toxicity prediction scores derived from cell or tissue samples exposed to toxins may be used to predict at least one toxic effect, including the hepatotoxicity, renal toxicity or other tissue toxicity of a test or unknown agent or compound.
  • the gene regulation scores and toxicity prediction scores from cell or tissue samples exposed to toxins may also be used to predict the ability of a test agent or compound to induce a tissue pathology, such as liver necrosis, in a sample.
  • the toxicology prediction methods of the invention are limited only by the availability of the appropriate toxicology model and toxicology prediction scores.
  • the prediction methods of a given system can be expanded simply by running new toxicology studies and models of the invention using additional toxins or specific tissue pathology inducing agents and the appropriate cell or tissue samples.
  • at least one toxic effect includes, but is not limited to, a detrimental change in the physiological status of a cell or organism.
  • the response may be, but is not required to be, associated with a particular pathology, such as tissue necrosis. Accordingly, the toxic effect includes effects at the molecular and cellular level.
  • Hepatotoxicity is an effect as used herein and includes but is not limited to the pathologies of: cholestasis, genotoxicity/carcinogenesis, hepatitis, human- specific toxicity, induction of liver enlargement, steatosis, macrovesicular steatosis, microvesicular steatosis, necrosis, non-
  • assays to predict the toxicity of a test agent comprise the steps of exposing a cell or tissue sample or population of cell or tissue samples to the test agent or compound, providing nucleic acid hybridization data for at least one gene from the test agent exposed cell or tissue sample(s), by, for instance, assaying or measuring the level of relative or absolute gene expression of one or more of the genes, such as one or more of the genes in Table 2, calculating a sample prediction score and comparing the sample prediction score to one or more toxicology reference scores (see Example 1).
  • "i" is the index number for each gene in a gene expression profile to be evaluated.
  • w is the PLS weight (or PLS score) for each gene derived from a toxicity model.
  • R F ' is the RMA fold-change value for the i th gene, as determined from a normalized RMA matrix of gene expression data from the sample (described above). The PLS weight from a given model multiplied by the RMA fold-change value gives a gene regulation score for each gene, and the regulation scores for all the individual genes are added to give a prediction score for the sample.
  • Nucleic acid hybridization data may include any measurement of the hybridization, including gene expression levels, of sample nucleic acids to probes corresponding to about (or at least) 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 50, 75, 100, 200, 500, 1000 or more genes, or ranges of these numbers, such as about 2-10, about 10-20, about 20-50, about 50-100, about 100-200, about 200-500 or about 500-1000 genes.
  • Nucleic acid hybridization data for toxicity prediction may also include the measurement of nearly all the genes in a toxicity model. "Nearly all" the genes may be considered to mean at least 80% of the genes in any one toxicity model.
  • the methods of the invention to predict at least one toxic effect of a test agent or compound may be practiced by one individual or at one location, or may be practiced by more than one individual or at more than one location.
  • methods of the invention include steps wherein the exposure of a test agent or compound to a cell or tissue sample(s) is accomplished in one location, nucleic acid processing and the generation of
  • nucleic acid hybridization data takes place at another location and gene regulation and sample prediction scores calculated or generated at another location.
  • cell or tissue samples are exposed to a test agent or compound by administering the agent to laboratory rats and nucleic acids are processed from selected tissues and hybridized to a microarray to produce nucleic acid hybridization data.
  • the nucleic acid hybridization data is then sent to a remote server comprising a toxicology reference database and software that enables generation of individual gene regulation scores and one or more sample prediction scores from the nucleic acid hybridization data.
  • the software may also enable a user to pre-select specific toxicology models and to compare the generated sample prediction scores to one or more toxicology reference scores contained within a database of such scores. The user may then generate or order an appropriate output product(s) that presents or represents the results of the data analysis, generation of gene regulation scores, sample prediction scores and/or comparisons to one or more toxicology reference scores.
  • Data including nucleic acid hybridization data, may be transmitted to a server via any means available, including a secure direct dial-up or a secure or unsecured Internet connection. Toxicology prediction reports or any result of the methods herein may also be transmitted via these same mechanisms. For instance, a first user may transmit nucleic acid hybridization data to a remote server via a secure password protected Internet link and then request transmission of a toxicology report from the server via that same Internet link.
  • Data transmitted by a remote user of a toxicity database or model may be raw, un- normalized data or may be normalized from various background parameters before transmission. For instance, data from a microarray may be normalized for various chip and background parameters such as those described above, before transmission.
  • the data may be in any form, as long as the data can be recognized and properly formatted by available software or the software provided as part of a database or computer system.
  • microarray data may be provided and transmitted in a .eel file or any other common data files produced from the analysis of microarray based hybridization on commercially available technology platforms (see, for instance, the Affymetrix GeneChip ® Expression Analysis Technical Manual available at www.affvmetrix.com).
  • Such files may or may not be annotated with various information, for instance, but not limited to, information related to the
  • nucleic acid hybridization data may be screened for database compatibility by any available means.
  • commonly available data quality control metrics can be applied. For instance, outlier analysis methods or techniques may be utilized to identify samples incompatible with the database, for instance, samples exhibiting erroneous florescence values from control probes which are common between the data and the database or toxicity model.
  • various data QC metrics can be applied, including one or more disclosed in PCT/US03/24160, filed August 1, 2003, which claims priority to U.S. provisional application 60/399,727.
  • the cell population that is exposed to the test agent, compound or composition may be exposed in vitro or in vivo.
  • cultured or freshly isolated liver cells in particular rat hepatocytes, may be exposed to the agent under standard laboratory and cell culture conditions.
  • in vivo exposure may be accomplished by administration of the agent to a living animal, for instance a laboratory rat.
  • test organisms In in vitro toxicity testing, two groups of test organisms are usually employed. One group serves as a control, and the other group receives the test compound in a single dose (for acute toxicity tests) or a regimen of doses (for prolonged or chronic toxicity tests). Because, in some cases, the extraction of tissue as called for in the methods of the invention requires sacrificing the test animal, both the control group and the group receiving compound must be large enough to permit removal of animals for sampling tissues, if it is desired to observe the dynamics of gene expression through the duration of an experiment. [0051] In setting up a toxicity study, extensive guidance is provided in the literature for selecting the appropriate test organism for the compound being tested, route of administration, dose ranges, and the like. Water or physiological saline (0.9% NaCl in water)
  • 1-WA/22981S0.2 is the solute of choice for the test compound since these solvents permit administration by a variety of routes. When this is not possible because of solubility limitations, vegetable oils such as corn oil or organic solvents such as propylene glycol may be used. [0052] Regardless of the route of administration, the volume required to administer a given dose is limited by the size of the animal that is used. It is desirable to keep the volume of each dose uniform within and between groups of animals. When rats or mice are used, the volume administered by the oral route generally should not exceed about 0.005 ml per gram of animal. Even when aqueous or physiological saline solutions are used for parenteral injection the volumes that are tolerated are limited, although such solutions are ordinarily thought of as being innocuous.
  • the intravenous LD 0 of distilled water in the mouse is approximately 0.044 ml per gram and that of isotonic saline is 0.068 ml per gram of mouse.
  • the route of administration to the test animal should be the same as, or as similar as possible to, the route of administration of the compound to man for therapeutic purposes.
  • a compound When a compound is to be administered by inhalation, special techniques for generating test atmospheres are necessary. The methods usually involve aerosolization or nebulization of fluids containing the compound. If the agent to be tested is a fluid that has an appreciable vapor pressure, it may be administered by passing air through the solution under controlled temperature conditions. Under these conditions, dose is estimated from the volume of air inhaled per unit time, the temperature of the solution, and the vapor pressure of the agent involved. Gases are metered from reservoirs. When particles of a solution are to be administered, unless the particle size is less than about 2 ⁇ m the particles will not reach the terminal alveolar sacs in the lungs.
  • the cell population to be exposed to the agent may be divided into two or more subpopulations, for instance, by dividing the population into two or more identical aliquots.
  • the cells to be exposed to the agent are derived from liver tissue. For instance, cultured or freshly isolated rat hepatocytes may be used.
  • the methods of the invention may be used generally to predict at least one toxic response, and, as described in the Examples, may be used to predict the likelihood that a compound or test agent will induce various specific pathologies, such as liver cholestasis, genotoxicity/carcinogenesis, hepatitis, human-specific toxicity, induction of liver enlargement, steatosis, macrovesicular steatosis, microvesicular steatosis, necrosis, non- genotoxic/non-carcinogenic toxicity, peroxisome proliferation, rat non-genotoxic toxicity, general hepatotoxicity, or other pathologies associated with at least one known toxin.
  • pathologies such as liver cholestasis, genotoxicity/carcinogenesis, hepatitis, human-specific toxicity, induction of liver enlargement, steatosis, macrovesicular steatosis, microvesicular steatosis, necrosis, non
  • the methods of the invention may also be used to determine the similarity of a toxic response to one or more individual compounds.
  • the methods of the invention may be used to predict or elucidate the potential cellular pathways influenced, induced or modulated by the compound or test agent.
  • Databases and computer systems of the present invention typically comprise one or more data structures comprising toxicity or toxicology models as described herein, including models comprising individual gene or toxicology marker weighted index scores or PLS scores (See Table 2), gene regulation scores, sample prediction scores and/or toxicity reference prediction scores.
  • Such databases and computer systems may also comprise software that allows a user to manipulate the database content or to calculate or generate scores as described herein, including individual gene regulation scores and sample prediction scores from nucleic acid hybridization data.
  • Software may also allow a user to predict, assay for or screen for at least one toxic response, including toxicity, hepatotoxicity, renal toxicity, etc, to include gene or protein pathway information and/or to include information related to the mechanism of toxicity, including possible cellular and molecular mechanisms.
  • software may include at least one element from the Gene Logic ToxShieldTM Predictive Modeling System such as software comprising at least one algorithm to convert hybridization data from varying platforms, for instance from one microarray platform to a second microarray platform (see U.S. Provisional Application 60/613,831, filed September 29, 2004, which is herein incorporated by reference in its entirety for all purposes).
  • the databases and computer systems of the invention may comprise equipment and software that allow access directly or through a remote link, such as direct dial-up access or access via a password protected Internet link.
  • Any available hardware may be used to create computer systems of the invention. Any appropriate computer platform, user interface, etc. may be used to perform the necessary comparisons between sequence information, gene or toxicology marker information and any other information in the database or information provided as an input. For example, a large number of computer workstations are available from a variety of manufacturers. Client/server environments, database servers and networks are also widely available and appropriate platforms for the databases of the invention.
  • the databases may be designed to include different parts, for instance a sequence database and a toxicology reference database. Methods for the configuration and construction of such databases and computer-readable media containing such databases are widely available, for instance, see U.S. Publication No. 2003/0171876 (Serial No. 10/090,144), filed March 5, 2002, PCT Publication No. WO 02/095659, published November 23, 2002, and U.S. Patent No. 5,953,727, which are herein incorporated by reference in their entirety.
  • the database is a ToxExpress ® or BioExpress ® database marketed by Gene Logic Inc., Gaithersburg, MD.
  • the databases of the invention may be linked to an outside or external database such as GenBank (www.ncbi.nlm.nih.gov/entrez.index.html); KEGG (www.genome.ad.jp/kegg); SPAD (www.grt.kyushu-u.ac.jp/spad/index.html); HUGO (www.gene.ucl.ac.uk/hugo); Swiss- Prot (www.expasy.ch.sprot); Prosite (www.expasy.ch/tools/scnpsitl.html); OMIM (www.ncbi.nlm.nih.gov/omim); and GDB (www.gdb.org).
  • the external database is GenBank and the associated databases maintained by the National Center for Biotechnology Information (NCBI) (www.ncbi.nlm.nih.gov).
  • the methods, databases and computer systems of the invention can be used to produce, deliver and/or send a toxicity or toxicology report.
  • a toxicity report As descried above, the methods, databases and computer systems of the invention can be used to produce, deliver and/or send a toxicity or toxicology report.
  • toxicity report As consistent with the use of the terms "toxicity” and “toxicology” as used herein, a "toxicity report” and a
  • the toxicity report of the invention typically comprises information or data related to the results of the practice of a method of the invention. For instance, the practice of a method of identifying at least one toxic effect of a test agent or compound as herein described may result in the preparation or production of a report describing the results of the method
  • the report may comprise information related to the toxic effects predicted by the comparison of at least one sample prediction score to at least one toxicity reference prediction score from the database as well as other related information such as a literature review or citation list and/or information regarding potential toxicity mechanism(s) of action, etc.
  • the report may also present information concerning the nucleic acid hybridization data, such as the integrity of the data as well as information input by the user of the database and methods of the invention, such as information used to annotate the nucleic acid hybridization data.
  • a toxicity report of the invention may be in a form such as the reports disclosed in PCT US02/22701, filed July 18, 2002, and U.S. Provisional Application 60/613,831, filed September 29, 2004, both of which are herein incorporated by reference in their entirety for all purposes.
  • the report may be generated by a server or computer system to which is loaded nucleic acid hybridization data by a user.
  • the report related to that nucleic acid data may be generated and delivered to the user via remote means such as a password secured environment available over the Internet or via available computer communication means such as email.
  • Any assay format to detect gene expression may be used to produce nucleic acid hybridization data.
  • traditional Northern blotting, dot or slot blot, nuclease protection, primer directed amplification, RT- PCR, semi- or quantitative PCR, branched- chain DNA and differential display methods may be used for detecting gene expression levels or producing nucleic acid hybridization data.
  • Those methods are useful for some embodiments of the invention.
  • amplification based assays may be most efficient.
  • Methods and assays of the invention may be most efficiently designed with high-throughput hybridization-based methods for detecting the expression of a large number of genes.
  • any hybridization assay format may be used, including solution-based and solid support-based assay formats.
  • Solid supports containing oligonucleotide probes for differentially expressed genes of the invention can be
  • l-W A/2298150.2 filters, polyvinyl chloride dishes, particles, beads, microparticles or silicon or glass based chips, etc.
  • Such chips, wafers and hybridization methods are widely available, for example, those disclosed by Beattie (WO 95/11755).
  • a solid surface to which oligonucleotides can be bound, either directly or indirectly, either covalently or non-covalently, can be used.
  • a preferred solid support is a high density array or DNA chip. These contain a particular oligonucleotide probe in a predetermined location on the array. Each predetermined location may contain more than one molecule of the probe, but each molecule within the predetermined location has an identical sequence. Such predetermined locations are termed features. There may be, for example, from 2, 10, 100, 1000 to 10,000, 100,000 or 400,000 or more of such features on a single solid support. The solid support, or the area within which the probes are attached may be on the order of about a square centimeter. Probes corresponding to the genes of Tables 1- 2 or from the related applications described above may be attached to single or multiple solid support structures, e.g., the probes may be attached to a single chip or to multiple chips to comprise a chip set.
  • Oligonucleotide probe arrays including bead assays or collections of beads, for expression monitoring can be made and used according to any techniques known in the art (see for example, Lockhart et al. (1996), Nat Biotechnol 14:1675-1680; McGall et al. (1996), Proc Nat Acad Sci USA 93 : 13555-13460).
  • Such probe arrays may contain at least two or more oligonucleotides that are complementary to or hybridize to two or more of the genes described in Table 2.
  • Such arrays may contain oligonucleotides that are complementary to or hybridize to at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 70, 100, 500 or 1,000 or more of the genes described herein.
  • the sequences of the toxicity expression marker genes of Table 2 are in the public databases.
  • Table 1 provides the SEQ ID NO: and GenBank Accession Number (NCBI RefSeq ID) for each of the sequences (see www.ncbi.nlm.nih.gov/), as well as the title for the cluster of which gene is part.
  • GenBank Accession Number NCBI RefSeq ID
  • the sequences of the genes in GenBank are expressly herein incorporated by reference in their entirety as of the filing date of this application, as are related sequences, for instance, sequences from the same gene of different lengths, variant sequences, polymorphic sequences, genomic sequences of the genes and related sequences from different species, including the human counterparts, where appropriate.
  • background or “background signal intensity” refer to hybridization signals resulting from non-specific binding, or other interactions, between the labeled target nucleic acids and components of the oligonucleotide array (e.g., the oligonucleotide probes, control probes, the array substrate, etc.). Background signals may also be produced by intrinsic fluorescence of the array components themselves. A single background signal can be calculated for the entire array, or a different background signal may be calculated for each target nucleic acid.
  • background is calculated as the average hybridization signal intensity for the lowest 5% to 10% of the probes in the array, or, where a different background signal is calculated for each target gene, for the lowest 5% to 10% of the probes for each gene.
  • background may be calculated as the average hybridization signal intensity produced by hybridization to probes that are not complementary to any sequence found in the sample (e.g. probes directed to nucleic acids of the opposite sense or to genes not found in the sample such as bacterial genes where the sample is mammalian nucleic acids). Background can also be calculated as the average signal intensity produced by regions of the array that lack any probes at all.
  • hybridizing specifically to or “specifically hybridizes” refers to the binding, duplexing, or hybridizing of a molecule substantially to or only to a particular nucleotide sequence or sequences under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA.
  • a "probe” is defined as a nucleic acid, capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation.
  • a probe may include natural (i.e., A, G, U, C, or T) or modified bases (7- deazaguanosine, inosine, etc.).
  • the bases in probes may be joined by a linkage other than a phosphodiester bond, so long as it does not interfere with hybridization.
  • probes may be peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than phosphodiester linkages.
  • Cell or tissue samples may be exposed to the test agent in vitro or in vivo.
  • appropriate mammalian cell extracts such as liver extracts, may also be added with the test agent to evaluate agents that may require biotransformation to exhibit toxicity.
  • primary isolates or cultured cell lines of animal or human renal cells may be used.
  • the genes which are assayed according to the present invention are typically in the form of mRNA or reverse transcribed mRNA.
  • the genes may or may not be cloned.
  • the genes may or may not be amplified. The cloning and/or amplification do not appear to bias the representation of genes within a population. In some assays, it may be preferable, however, to use polyA+ RNA as a source, as it can be used with fewer processing steps.
  • nucleic acid samples used in the methods and assays of the invention may be prepared by any available method or process. Methods of isolating total mRNA are well known to those of skill in the art.
  • RNA samples include RNA samples, but also include cDNA synthesized from a mRNA sample isolated from a cell or tissue of interest. Such samples also include DNA amplified from the cDNA, and RNA transcribed from the amplified DNA.
  • Biological samples may be of any biological tissue or fluid or cells from any organism as well as cells raised in vitro, such as cell lines and tissue culture cells. Frequently the sample will be a tissue or cell sample that has been exposed to a compound, agent, drug, pharmaceutical composition, potential environmental pollutant or other composition. In some formats, the sample will be a "clinical sample" which is a sample derived from a patient. Typical clinical samples include, but are not limited to, sputum, blood, blood-cells (e.g., white cells), tissue or fine needle biopsy samples, urine, peritoneal fluid, and pleural fluid, or cells therefrom. Biological samples may also include sections of tissues, such as frozen sections or formalin fixed sections taken for histological purposes.
  • Nucleic acid hybridization simply involves contacting a probe and target nucleic acid under conditions where the probe and its complementary target can form stable hybrid duplexes through complementary base pairing. See WO 99/32660. The nucleic acids that do not form hybrid duplexes are then washed away leaving the hybridized nucleic acids to be detected, typically through detection of an attached detectable label. It is generally recognized that nucleic acids are denatured by increasing the temperature or decreasing the salt concentration of the buffer containing the nucleic acids. Under low stringency conditions (e.g., low temperature and/or high salt) hybrid duplexes (e.g., DNA:DNA, RNA:RNA, or RNA:DNA) will form even where the annealed sequences are not perfectly complementary.
  • low stringency conditions e.g., low temperature and/or high salt
  • hybridization conditions may be selected to provide any degree of stringency.
  • hybridization is performed at low stringency, in this case in 6x SSPET at 37°C (0.005% Triton X-100), to ensure hybridization and then subsequent washes are performed at higher stringency (e.g., lx SSPET at 37°C) to eliminate mismatched hybrid duplexes. Successive washes may be performed at increasingly higher stringency (e.g., down to as low as 0.25x SSPET at 37°C to 50°C) until a desired level of hybridization specificity is obtained. Stringency can also be increased by addition of agents such as formamide. Hybridization specificity may be evaluated by comparison of hybridization to the test probes with hybridization to the various controls that can be present (e.g., expression level control, normalization control, mismatch controls, etc.).
  • the wash is performed at the highest stringency that produces consistent results and that provides a signal intensity greater than the background intensity.
  • the hybridized array may be washed at successively higher stringency solutions and read between each wash. Analysis of the data sets thus produced will reveal a wash stringency above which the hybridization pattern is not appreciably altered and which provides adequate signal for the particular oligonucleotide probes of interest.
  • the invention further includes kits combining, in different combinations, high-density oligonucleotide arrays, reagents for use with the arrays, signal detection and array-processing instruments, toxicology databases and analysis and database management software described above.
  • the kits may be used, for example, to predict or model the toxic response of a test compound.
  • the database software and packaged information may contain the databases saved to a computer-readable medium, or transferred to a user's local server.
  • database and software information may be provided in a remote electronic format, such as a website, the address of which may be packaged in the kit.
  • kidney toxins are administered to male Sprague-Dawley rats at various timepoints using administration diluents, protocols and dosing regimes as previously described in the art and previously described in the priority application discussed above. .
  • l-WA/2298150.2 As an illustration of the protocols used, the toxins are administered to and animals are sacrificed and kidney samples harvested at the time points indicated below.
  • Potential signs of toxicity including tremors, convulsions, salivation, diarrhea, lethargy, coma or other atypical behavior or appearance, are recorded as they occur and include a time of onset, degree, and duration.
  • EDTA tubes for evaluation of hematology parameters. Approximately 1 mL of blood is collected into serum separator tubes for clinical chemistry analysis. Approximately 200 ⁇ L of plasma is obtained and frozen at ⁇ -80°C for test compound/metabolite estimation. An additional ⁇ 2 mL of blood is collected into a 15 mL conical polypropylene vial to which ⁇ 3 mL of Trizol is immediately added. The contents are immediately mixed with a vortex and by repeated inversion. The tubes are frozen in liquid nitrogen and stored at — 80°C.
  • rats are weighed, physically examined, sacrificed by decapitation, and exsanguinated. The animals are necropsied within approximately five
  • Necropsies are conducted on each animal following procedures approved by board-certified pathologists.
  • a sagittal cross-section containing portions of the two atria and of the two ventricles is preserved in 10% NBF.
  • the remaining heart is frozen in liquid nitrogen and stored at ⁇ -
  • Testes both-A sagittal cross-section of each testis is preserved in 10% NBF. The remaining testes are frozen together in liquid nitrogen and stored at — 80°C.
  • Brain whole)-A cross-section of the cerebral hemispheres and of the diencephalon are preserved in 10% NBF, and the rest of the brain is frozen in liquid nitrogen and stored at ⁇ -80°C. l-WA/2298150.2
  • Microarray sample preparation is conducted with minor modifications, following the protocols set forth in the Affymetrix GeneChip ® Expression Technical Analysis Manual (Affymetrix, Inc. Santa Clara, CA). Frozen tissue is ground to a powder using a Spex Certiprep 6800 Freezer Mill.
  • RNA Total RNA is extracted with Trizol (Invitrogen, Carlsbad CA) utilizing the manufacturer's protocol. mRNA is isolated using the Oligotex mRNA Midi kit (Qiagen) followed by ethanol precipitation. Double stranded cDNA is generated from mRNA using the Superscript Choice system (Invitrogen, Carlsbad CA). First strand cDNA synthesis is primed with a T7-(dT24) oligonucleotide. The cDNA is phenol-chloroform extracted and ethanol precipitated to a final concentration of 1 ⁇ g/ml. From 2 ⁇ g of cDNA, cRNA is synthesized using Ambion's T7 MegaScript in vitro Transcription Kit.
  • cRNA is fragmented (fragmentation buffer consisting of 200 mM Tris-acetate, pH 8.1, 500 mM KOAc, 150 mM MgOAc) for thirty-five minutes at 94°C.
  • fragmentation buffer consisting of 200 mM Tris-acetate, pH 8.1, 500 mM KOAc, 150 mM MgOAc
  • Affymetrix protocol 55 ⁇ g of fragmented cRNA is hybridized on the Affymetrix rat array set for twenty- four hours at 60 rpm in a 45°C hybridization oven.
  • the chips are washed and stained with Streptavidin Phycoerythrin (SAPE) (Molecular Probes) in Affymetrix fluidics stations.
  • SAPE Streptavidin Phycoerythrin
  • SAPE solution is added twice with an anti-streptavidin biotinylated antibody (Vector Laboratories) staining step in between.
  • Hybridization to the probe arrays is detected by fluorometric scanning (Hewlett Packard Gene Array Scanner). Data is analyzed using Affymetrix GeneChip ® and Expression Data Mining (EDMT) software, the GeneExpress ® database, and S-Plus ® statistical analysis software (Insightful Corp.).
  • EDMT Expression Data Mining
  • RMA fold-change matrices the rows represent individual fragments, and the columns are individual samples.
  • a vehicle cohort median matrix is then calculated, in which the rows represent fragments and the columns represent vehicle cohorts, one cohort for each study/time-point combination.
  • the values in this matrix are the median RMA expression values across the samples within those cohorts.
  • a matrix of normalized RMA expression values is generated, in which the rows represent individual fragments and the columns are individual samples.
  • the normalized RMA values are the RMA values minus the value from the vehicle cohort median matrix corresponding to the time-matched vehicle cohort.
  • PLS works by computing a series of PLS components, where each component is a weighted linear combination of fragment values. We use the nonlinear iterative partial least squares method to compute the PLS components.
  • a vehicle cohort mean matrix is generated, in which the rows represent fragments and the columns represent vehicle cohorts, one cohort for each study/time-point combination.
  • the values in this matrix are the mean RMA expression values across the samples within those cohorts.
  • a treated cohort mean matrix is then generated, in which the rows represent fragments and the columns represent treated (non- vehicle) cohorts, one cohort for each study/time-point/compound/dose combination.
  • the values in this matrix are the mean RMA expression values across the samples within those cohorts.
  • a treated cohort fold-change matrix is generated, in which the rows represent fragments and the columns represent treated cohorts, one cohort for each study/time- point/compound/dose combination.
  • the values in this matrix are the values in the treated cohort mean matrix minus the values in the vehicle cohort mean matrix corresponding to appropriate time-matched vehicle cohorts. Subsequently, a treated cohort p-value matrix is generated, in which the rows represent fragments and the columns represent treated cohorts, one cohort for each study/time-point/compound/dose combination. The values in this matrix are p-values based on two-sample t-tests comparing the treated cohort mean values to the vehicle cohort mean values corresponding to appropriate time-matched vehicle cohorts. This
  • l-WA/2298150.2 matrix is converted to a binary coding based on the p-values being less than 0.05 (coded as 1) or greater than 0.05 (coded as 0).
  • the row sums of the binary treated cohort p-value matrix are computed, where that row sum represents a "gene regulation score" for each fragment, representing the total number of treated cohorts where the fragment showed differential regulation (up- or down- regulation) compared to its time-matched vehicle cohort.
  • PLS modeling and 2/3 / 1/3 cross- validation are then performed based on taking the top N fragments according to the regulation score, varying N and the number of PLS components, and recording the model success rate for each combination.
  • N is chosen to be the point at which the cross-validated error rate are minimized.
  • each of those N fragments receives a PLS weight (PLS score) corresponding to the fragment's utility, or predictive ability, in the model (see Table 2 for an exemplary list of PLS scores for a kidney general toxicity model).
  • i is the index number for each gene in a gene expression profile to be evaluated
  • w is the PLS weight (or PLS score, see Table 2 for an exemplary list of PLS scores for a general kidney toxicity model) for each gene.
  • R ' is the RMA fold-change value for the i gene, as determined from a normalized RMA matrix of gene expression data from the sample (described above). The PLS weight multiplied by the RMA fold-change value gives a gene regulation score for each gene, and the regulation scores for all the individual genes are added to give a prediction score for the sample.
  • l-WA/2298150 2 average correlation for that sample. If the average correlation is less than a threshold (for instance .90), the sample is flagged as a potential outlier. This process is repeated for each row (sample) in the study. Outliers flagged by the average correlation QC check are dropped out of any downstream normalization, prediction or compound similarity steps in the process. [00107] To establish a toxicity prediction score cut-off value for a toxicity model, the true- positive and false positive rates for each possible score cut-off value are computed, using the scores from all tox and non-tox samples in the training set. This generates an ROC curve, which we use to set the cut-off score at the point on the ROC curve corresponding to ⁇ 5% false positive rate.
  • a threshold for instance .90
  • a cut-off prediction score is about 0.318. If the sample score is about 0.318 or above, it can be predicted that the sample shows a toxic response after exposure to the test compound. If the sample score is below 0.318, it can be predicted that the sample does not show a toxic response [00108]
  • the model can be trained by setting a score of-1 for each gene that cannot predict a toxic response and by setting a score of +1 for each gene that can predict a toxic response. Cross-validation of RMA/PLS models may be performed by the compound-drop method and by the 2/3: 1/3 method.
  • sample data from animals treated with one particular test compound are removed from a model, and the ability of this model to predict toxicity is compared to that of a model containing a full data set.
  • gene expression information from a random third of the genes in the model is removed, and the ability of this subset model to predict toxicity is compared to that of a model containing a full data set.
  • a report may be generated comprising information or data related to the results of the methods of predicting at least one toxic effect.
  • the report may comprise information related to the toxic effects predicted by the comparison of at least one sample prediction score to at least one toxicity reference prediction score from the database.
  • the report may also
  • l-WA/22 ' 98150.2 present information concerning the nucleic acid hybridization data, such as the integrity of the data as well as information inputted by the user of the database and methods of the invention, such as information used to annotate the nucleic acid hybridization data. See PCT US02/22701 for a non-limiting example of a toxicity report that may be generated.
  • An algorithm was developed to convert probe intensity data from a first type of microarray to RMA data of a second type of microarray. This is beneficial to the customer because it provides the customer with the freedom to select the type of microarray it wishes to use with a RMA/PLS predictive model. Frequently this is the newest microarray on the market.
  • the algorithm is beneficial for the company which builds RMA PLS statistical models on microarray data because money and resources do not have to be expended to rebuild statistical models built on discontinued microarrays.
  • the conversion algorithm developed can be used on data from the Affymetrix GeneChip® rat RAE 2.0 microarray to Affymetrix GeneChip® rat RGU34 A microarray data. This conversion also allows the use of RMA/PLS toxicogenomics models built on the Affymetrix RGU34 A microarray platform to predict customer data generated on the RAE2.0 microarray platform. The conversion algorithm was tested using the liver toxicity model described in U.S. Provisional Application Serial No. 60/559,949 and herein incorporated by reference.
  • the first step to using a conversion algorithm is to map microarray fragments.
  • the RGU34 A microarray fragments which comprise the liver toxicity model were mapped to the RAE2.0 microarray.
  • the liver toxicity model is based on 1,100 Affymetrix GeneChip® RGU34 A microarray fragments. Of the 1,100 fragments in the model, 907 were suggested by Affymetrix as matching to fragments on the RAE2.0 microarray. See Affymetrix's "User's Guide to Product Comparison Spreadsheets" which is herein incorporated by reference.
  • the 1067 mapping fragments were reduced to 1053.
  • the 1053 mapped fragments represented 16 RGU34 A and 1 1 RAE 2.0 probes.
  • l-WA/2298150.2 were assigned an RMA fold-change value of 0 for all samples and did not contribute to the prediction.
  • training samples are selected to calculate the conversion model weights.
  • the inventors searched Gene Logic's ToxExpress® reference database, a database which is built on the Affymetrix RGU34A platform, for samples that covered a large amount of interquartile range with respect to signal intensity. Samples that covered the largest amount of variable space were selected because this method of sample selection had previously been determined by the inventors to be reliable in the development of a human sample conversion algorithm.
  • the samples maximized ⁇ , ( Max(X, j ) - Min(X, j ) ), where i indexes genes and j indexes samples.
  • sample size calculations were stable at a sampling of approximately 100 microarrays. For this reason, a training set consisting of 100 compounds and vehicles from rat liver tissue was selected.
  • the 100 training samples were used to train the weights in the conversion algorithm. This step is important because it provides for the quantitative aspect of the conversion.
  • the weight training was performed based on a multiple regression analysis with probe values as the independent variables and RMA expression as the sum of the dependent variables.
  • Test samples were evaluated using the trained conversion algorithm.
  • the multiple regression model was built on the 11 perfect match probe intensities and generated a predicted RGU34 expression value from a weighted sum of RAE 2.0 probe values. Each test array was scaled to an average probe intensity of 10 (log scale).
  • the conversion algorithm used is given as:
  • Y, RGU34 ⁇ ,o + ⁇ ij LOG (Xi;*** 2 °/S)
  • Y is the RGU34 RMA expression value for a fragment
  • S is a chip scale factor ⁇ l ⁇ Xy RAE2 % . Probe intensities were first floored to the minimum intensity value of 30.
  • RAE2.0 data to RGU34 RMA data Non-linear regression on probe values as well as canonical correlation of RAE2.0 probes to RGU34 A probes could be used.
  • l-WA/2298150 2 a RAE2.0 microarray could be computed and then scaled or quantile-normalized to RGU34 A RMA values.
  • multiple regression analysis used in this example does not take into account mismatched probes, an analysis could be used which takes into account mismatched probes.
  • the liver predictive model was used to compare the predictive results of test data from the RGU34 microarray to test data derived from converted RAE2.0 array data. The consistency between the RGU34 array results and the converted RAE2.0 array results was quite high. Table 3 provides the number of test samples per compound which were predicted as toxic out of the total number of samples for that compound using RGU34 RMA data and RAE2.0 converted RMA data.
  • Amitryptilene, estradiol, amiodarone, diflunisal, phenobarbital, dioxin, ethionine, and LPS were selected as test toxicants.
  • Clofibrate was selected because it is a rat-specific toxicant.
  • Metformin, rosiglitazone, chlorpheniramine, and streptomycin were selected as test negative controls. The rat-specific toxicant and all of the tested negative controls correctly predicted no toxicity.
  • a web-based software predictive modeling system called the ToxShieldTM Suite was created which is composed of a collection of RMA/PLS toxicity predictive models. Liver RMA/PLS predictive models were built to allow a user to identify and classify various toxic and mechanistic responses to unknown or test compounds.
  • the models represent a wide variety of endpoint pathologies and indications, including general toxicity, necrosis, steatosis, macrovesicular steatosis, microvesicular steatosis, cholestasis, hepatitis, carcinogenicity, genotoxic carcinogenicity, non-genotoxic carcinogenicity, rat specific non-genotoxic carcinogenicity, peroxisome proliferation, and inducer/liver enlargement.
  • the outcome of toxicity models represents a detailed categorization of test or unknown compounds from which mechanistic information can be inferred.
  • the current models available as part of this software system are related to liver toxicity, models relating to specific toxicities of other organs including, but not limited to, liver primary cell culture, kidney, heart, spleen, bone marrow, and brain could be used.
  • the conversion algorithm described in Example 3 can be implemented in a software product such as the ToxShieldTM Suite.
  • the customer inputs his or her data that has been generated on a microarray such as the Affymetrix RAE2.0 GeneChip® microarray platform.
  • the software utilizes the algorithm to convert the customer's gene expression data to RMA data which is compatible with the software's toxicogenomics model built which was built exclusively on a second microarray platform such as the Affymetrix RGU34 A GeneChip® microarray. Visualizations and predictions can then be generated from the customer's data using the predictive model.

Abstract

L'invention se rapporte à des procédés permettant de prédire la toxicité d'agents testés, et à des procédés permettant de produire des modèles prédictifs de toxicité à l'aide d'algorithmes servant à l'analyse des données quantitatives d'expression génique. L'invention concerne également des systèmes informatiques comprenant ces modèles prédictifs de toxicité, ainsi que des procédés permettant à des utilisateurs distants d'utiliser ces systèmes informatiques afin de déterminer la toxicité des agents testés.
EP04812167A 2003-11-24 2004-11-24 Procedes de modelisation de toxicologie moleculaire Withdrawn EP1697873A4 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
PCT/US2003/037556 WO2004048598A2 (fr) 2002-11-22 2003-11-24 Modelage nephrotoxicologique moleculaire
US55498104P 2004-03-22 2004-03-22
US61383104P 2004-09-29 2004-09-29
PCT/US2004/039593 WO2005052181A2 (fr) 2003-11-24 2004-11-24 Procedes de modelisation de toxicologie moleculaire

Publications (2)

Publication Number Publication Date
EP1697873A2 EP1697873A2 (fr) 2006-09-06
EP1697873A4 true EP1697873A4 (fr) 2008-01-23

Family

ID=34637018

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04812167A Withdrawn EP1697873A4 (fr) 2003-11-24 2004-11-24 Procedes de modelisation de toxicologie moleculaire

Country Status (4)

Country Link
EP (1) EP1697873A4 (fr)
JP (1) JP2007535305A (fr)
CA (1) CA2546391A1 (fr)
WO (1) WO2005052181A2 (fr)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2891278B1 (fr) * 2005-09-23 2008-07-04 Vigilent Technologies Sarl Procede pour determiner l'etat d'un ensemble de cellules et systeme pour la mise en oeuvre dudit procede
WO2009083030A1 (fr) * 2007-12-27 2009-07-09 Vereniging Voor Christelijk Hoger Onderwijs Méthode de prédiction de l'effet toxique d'un composé
CN107391961B (zh) * 2011-09-09 2020-11-17 菲利普莫里斯生产公司 用于基于网络的生物活性评估的系统与方法
CA2877429C (fr) 2012-06-21 2020-11-03 Philip Morris Products S.A. Systemes et procedes pour generer des signatures de biomarqueurs avec correction de biais et prediction de classe integrees
JP6313757B2 (ja) 2012-06-21 2018-04-18 フィリップ モリス プロダクツ エス アー 統合デュアルアンサンブルおよび一般化シミュレーテッドアニーリング技法を用いてバイオマーカシグネチャを生成するためのシステムおよび方法
EP3775931A1 (fr) * 2018-04-06 2021-02-17 Boehringer Ingelheim Vetmedica GmbH Procédé pour déterminer un analyte et système d'analyse
CN113496072A (zh) * 2020-03-22 2021-10-12 杭州环特生物科技股份有限公司 用于安全性评价的斑马鱼转换人用剂量的换算方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001096866A1 (fr) * 2000-06-14 2001-12-20 Vistagen, Inc. Typage de toxicite grace a des cellules embryonnaires de foie
WO2002010453A2 (fr) * 2000-07-31 2002-02-07 Gene Logic, Inc. Modelisation en toxicologie moleculaire

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6132969A (en) * 1998-06-19 2000-10-17 Rosetta Inpharmatics, Inc. Methods for testing biological network models
AU2002230997A1 (en) * 2000-12-15 2002-06-24 Genetics Institute, Llc Methods and compositions for diagnosing and treating rheumatoid arthritis
US7993907B2 (en) * 2001-05-08 2011-08-09 Mowycal Lending, Llc Biochips and method of screening using drug induced gene and protein expression profiling
WO2002093318A2 (fr) * 2001-05-15 2002-11-21 Psychogenics Inc. Systemes et procedes de controle informatique du comportement
CA2478640A1 (fr) * 2002-03-13 2003-09-18 F. Hoffmann-La Roche Ag Procede pour selectionner des facteurs determinant la sensibilite a des medicaments et procede pour predire la sensibilite a des medicaments a partir des facteurs ainsi selectionnes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001096866A1 (fr) * 2000-06-14 2001-12-20 Vistagen, Inc. Typage de toxicite grace a des cellules embryonnaires de foie
WO2002010453A2 (fr) * 2000-07-31 2002-02-07 Gene Logic, Inc. Modelisation en toxicologie moleculaire

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BULERA S J ET AL: "RNA EXPRESSION IN THE EARLY CHARACTERIZATION OF HEPATOTOXICANTS IN WISTAR RATS BY HIGH-DENSITY DNA MICROARRAYS", HEPATOLOGY, WILLIAMS AND WILKINS, BALTIMORE, MD, US, vol. 33, no. 5, May 2001 (2001-05-01), pages 1239 - 1258, XP009033090, ISSN: 0270-9139 *
IRIZARRY RAFAEL A ET AL: "Summaries of Affymetrix GeneChip probe level data.", NUCLEIC ACIDS RESEARCH 15 FEB 2003, vol. 31, no. 4, 15 February 2003 (2003-02-15), pages e15, XP002460628, ISSN: 1362-4962 *

Also Published As

Publication number Publication date
WO2005052181A2 (fr) 2005-06-09
WO2005052181A3 (fr) 2006-04-27
CA2546391A1 (fr) 2005-06-09
EP1697873A2 (fr) 2006-09-06
JP2007535305A (ja) 2007-12-06

Similar Documents

Publication Publication Date Title
De et al. Bioinformatics challenges in genome-wide association studies (GWAS)
WO2019169049A1 (fr) Systèmes et procédés de modélisation multimodale pour prédire et gérer un risque de démence pour des individus
Miller et al. Management of high-throughput DNA sequencing projects: Alpheus
Garrett-Mayer et al. Cross-study validation and combined analysis of gene expression microarray data
Charney The “Golden Age” of behavior genetics?
Trajkovski et al. SEGS: Search for enriched gene sets in microarray data
US20230063506A1 (en) Small rna disease classifiers
WO2007084187A2 (fr) Modélisation de cardiotoxicologie moléculaire
US20110071767A1 (en) Hepatotoxicity Molecular Models
KR102085169B1 (ko) 개인 유전체 맵 기반 맞춤의학 분석 시스템 및 이를 이용한 분석 방법
WO2004063334A2 (fr) Modelage cardiotoxicologique moleculaire
EP1697873A2 (fr) Procedes de modelisation de toxicologie moleculaire
WO2007022419A2 (fr) Modeles de toxicite moleculaire developpes a partir d'hepatocytes isoles
KR102041504B1 (ko) 환자 계층화를 위한 맞춤의학 분석 플랫폼
CN114207727A (zh) 用于从变体识别数据确定起源细胞的系统和方法
Schaid et al. Discovery of cancer susceptibility genes: study designs, analytic approaches, and trends in technology
KR102041497B1 (ko) 개인 유전체 맵 기반 맞춤의학 분석 플랫폼 및 이를 이용한 분석 방법
WO2003068908A2 (fr) Modelisation toxicologique moleculaire de la cardiotoxine
Kim et al. Genetic differences according to onset age and lung function in asthma: A cluster analysis
US20080281526A1 (en) Methods For Molecular Toxicology Modeling
US20200135300A1 (en) Applying low coverage whole genome sequencing for intelligent genomic routing
US20060240418A1 (en) Canine gene microarrays
CN101743320A (zh) 来自基因转录产物检测的具有广泛基础的疾病结合
WO2006037025A2 (fr) Modeles de toxicite moleculaire obtenus a partir d'hepatocytes isoles
US20070054269A1 (en) Molecular cardiotoxicology modeling

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060616

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LU MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL HR LT LV MK YU

DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20080102

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: OCIMUM BIOSOLUTIONS, INC.

17Q First examination report despatched

Effective date: 20081031

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20090512