EP1454282A1 - Procedes et systemes permettant d'identifier des composants de reseaux biochimiques mammaliens comme etant des cibles d'agents therapeutiques - Google Patents

Procedes et systemes permettant d'identifier des composants de reseaux biochimiques mammaliens comme etant des cibles d'agents therapeutiques

Info

Publication number
EP1454282A1
EP1454282A1 EP02773968A EP02773968A EP1454282A1 EP 1454282 A1 EP1454282 A1 EP 1454282A1 EP 02773968 A EP02773968 A EP 02773968A EP 02773968 A EP02773968 A EP 02773968A EP 1454282 A1 EP1454282 A1 EP 1454282A1
Authority
EP
European Patent Office
Prior art keywords
network
components
cell
jgf
recited
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP02773968A
Other languages
German (de)
English (en)
Other versions
EP1454282A4 (fr
Inventor
Colin Hill
Iya Khalil
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gene Network Sciences Inc
Original Assignee
Gene Network Sciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/286,372 external-priority patent/US20030144823A1/en
Application filed by Gene Network Sciences Inc filed Critical Gene Network Sciences Inc
Publication of EP1454282A1 publication Critical patent/EP1454282A1/fr
Publication of EP1454282A4 publication Critical patent/EP1454282A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/5005Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
    • G01N33/5091Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing the pathological state of an organism
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/30Dynamic-time models
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/46Assays involving biological materials from specific organisms or of a specific nature from animals; from humans from vertebrates
    • G01N2333/47Assays involving proteins of known structure or function as defined in the subgroups
    • G01N2333/4701Details
    • G01N2333/4739Cyclin; Prad 1
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/52Predicting or monitoring the response to treatment, e.g. for selection of therapy based on assay results in personalised medicine; Prognosis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/10Boolean models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/20Probabilistic models

Definitions

  • the present invention relates to drug discovery. More particularly, the invention relates to in silico methods for identifying one or more components of a cell as a target for interaction with one or more therapeutic agents. Even more specifically, the invention relates to methods for the simulation or analysis of the dynamic interrelationships of genes and proteins with one another and to the use of those methods to identify one or more cellular components as putative targets for a therapeutic agent.
  • Screening methods have been used to find compounds that have a sought after effect on a cell, i.e. the up regulation or down regulation of a gene. Screening assays are used to identify compounds and those which are identified may be used in further drug development activity. Using such methods, for example, antibodies that bind to receptors on animal tumor cells may be assayed and identified. In further drug development efforts, these antibodies or their epitopes can be analyzed and their therapeutic activity enhanced by methods known in the art.
  • a difficulty with such methods is that they are basically brute-force empirical methods that reveal little or nothing about the particular phenomena which take place within the cell when it is contacted with the compound identified in the screen.
  • the actual cellular dynamics may not be understood and this may lead to development of candidate drugs deleteriously, which affect other components in the cell and cause undesirable side effects.
  • This brute-force screening method is also limited by the speed at which assays can be conducted.
  • Another empirical approach used in drug development is that of screening compounds against a particular component of a cell which has been identified as being involved in a disease condition. Assays are conducted to determine the binding effect, chemical interaction or other modification of certain molecules within the cell such as genes or proteins. While the art has developed powerful, high throughput screening techniques by which tens of thousands of compounds are routinely screened for their interactive effect with one or more targets, such
  • KL3:2219704.1 methodologies are still inherently empirical and leave the researcher with no fundamental information about the mechanisms of interaction of a compound identified by such methods.
  • the compound so identified may have detrimental interactions with one or more other components of a cell and may cause more harm than good.
  • it In order to determine whether the so identified compound may ultimately be useful as a therapeutic, it must be tested using in vitro studies on cells containing the particular gene or protein with which it interacts, or in vitro animal studies to determine both its beneficial and possible detrimental effects. These additional tests are extremely time-consuming and expensive.
  • U.S. Patent No. 6,165,709 describes methods for identifying the cellular targets of a drug by comparing (i) the effects of the drug on a wild-type cell, (ii) the effects on a wild-type cell of modifications to a putative target of the drug, and (iii) the effects of the drug on a wild-type cell which has had the putative target modified.
  • the effect of the drug on the cell can be determined by measuring various aspects of the cell state, including gene expression, concentration of proteins, etc.
  • KL3 2219704 I It is a further and related object of the invention to reduce labor and equipment costs of empirical drug discovery processes.
  • the invention is broadly in the modeling of the interactions of the several genes, proteins and other components of a cell, the use of mathematical techniques to represent the interrelationships between the cell components and the manipulation of the dynamics of the cell to determine which one or more components of a cell may be targets for interaction with therapeutic agents.
  • Exemplary methods of the invention for identifying components of a cell as putative targets for interaction with one or more therapeutic agents, based on a cell simulation approach, comprise the steps of:
  • Exemplary methods of the invention for identifying components of a cell as putative targets for interaction with one or more therapeutic agents based upon an analytical approach comprise the steps of:
  • KL3:2219704.1 mathematical equations representing interrelationships between one or more of the components
  • FIG. 1 is a schematic representation of a typical gene expression network.
  • FIG. 2 is a schematic representation of atypical signal transduction/signal translation pathway.
  • FIG. 3 is an exemplary schematic representation, using the Diagrammatic Cell Language, of a portion of the signal transduction pathway and gene expression network that initiates the mammalian cell cycle.
  • FIGS. 3(a) through 3(o) collectively depict the key to reading diagrams in the format of Fig. 3.
  • FIG. 4 is a conventional schematic representation of the Wnt/Beta-Catenin signaling pathway that plays a critical role in the progression of colon cancer cells through the cell cycle.
  • FIG. 5 is a schematic representation of a biological network comprising two genes.
  • FIG. 6 is a phase portrait for the solution of the differential equation model for the two gene network of FIG. 5.
  • FIG. 7 is a graph showing the time evolution of the differential equation model for a single copy of the gene circuit of FIG. 5
  • FIG. 8 a graph showing the time evolution of the stochastic equation model for a single copy of the gene circuit of FIG. 5
  • FIG. 9 is a bifurcation diagram showing several states of a cell.
  • FIG. 10 is a graphical representation of the Wnt beta-catenin pathway in DCL.
  • FIG. 11 is a time-series profile of the concentration of several components of a cell and represents the "normal" state of the cell.
  • FIG. 12 is a time-series profile which represents the cancerous state of the cell.
  • FIG. 13 is a time-series profile which shows the effect of deleting APC from the cell.
  • FIG. 14 is a time-series profile which shows the effect of deleting HDC from the cell.
  • FIG. 15 is a time-series profile which shows the effect of adding Axin to the cancerous cell of FIG. 12.
  • FIG. 16 is a time-series profile which shows the effect of adding HDAC to the cancerous cell of FIG. 12.
  • FIG. 17 is a time-series profile which shows the effect of adding Axin and GSK3 to the cancerous cell of FIG. 12.
  • FIG. 18 is a time-series profile which shows the effect of adding Axin to the cancerous cell of FIG. 12.
  • FIG. 19 is a time-series profile which shows the effect of adding GSK 3 to the already perturbed cell of FIG. 18.
  • FIG. 20 is a time-series profile which shows the effect of adding HDAC to the twice perturbed cell of FIG. 19.
  • FIG. 21 is a time-series profile which shows the effect of reducing the concentration of
  • FIG. 22 is a time-series profile which shows the effect of reducing the concentration of GSK3 to zero in the already perturbed cell of FIG. 21.
  • FIG. 23 is a time-series profile which shows the effect of modifying the mathematical equations of the system by adding an additional Facilitator molecule, to enhance the binding of Axin to ⁇ -catenin, to the cancerous cell depicted in FIG. 12.
  • FIG. 24 is a time-series profile which shows the effect of reducing the concentration of HDAC to zero in the twice perturbed cell of FIG. 23.
  • FIG. 25 is a time-series profile which shows the effect of increasing the binding rate of Axin to b-catenin starting from the "cancerous" cell of FIG. 12.
  • FIG. 26 is a time-series profile which shows the effect of increasing the binding rate of Axin to B-Catenin slightly from the cancerous cell of FIG. 12.
  • FIG. 27 is a time-series profile which shows the effect of increasing the binding rate of
  • FIG. 28 is a time-series profile which shows the effect of increasing the binding rate of GSK3 to Axin in the twice perturbed cell of FIG. 27.
  • FIG. 29 is a time-series profile which shows the effect of setting the binding rate of Axin to GSK3 to zero in the normal cell of FIG. 11.
  • FIG. 30 is a time-series profile which shows the effect of setting the unbinding rate of B- Catenin to the C-Myc gene to zero in the already perturbed cell of FIG. 29.
  • FIG. 31 is a time-series profile which shows the effect of systematically changing parameter values to change the "cancerous" state of Fig 12 back to a "normal” state similar to Fig 11.
  • FIG. 31(a) depicts a modular description of an exemplary colon cancer cell simulation.
  • FIG. 32 depicts a modular description of an exemplary colon cancer cell simulation.
  • FIG. 32(a)-(h) depicts each module in detail so as to make the reactions visible.
  • FIGS. 33(a) - (d) contain the data points and simulation for the phosphorylated forms of AKT, MEK and ERK in the exemplary colon cancer cell simulaiton.
  • FIGS. 34 and 34(a) depict the results of perturbing 50 individual targets in the exemplary colon cancer cell model.
  • FIG. 35 lists combinations of certain targets identified by the exemplary colon cancer cell simulation whose absence caused apoptosis.
  • FIGS. 36 shows the mechanism of action of the perturbation in the exemplary colon cancer cell simulation.
  • FIG. 37 depicts the simulation output of an oncogenic Ras without autocrine signaling.
  • FIG. 38 depicts the simulation output of an oncogenic Ras with autocrine signaling.
  • FIG. 39 depicts the simuation output of levels of Be 12 without the G3139 antisense therapy.
  • FIG. 40 depicts the simulation output of inhibition of Be 12 using the G3139 antisense therapy.
  • FIG. 41 depicts the simulation output of inhibiting Be 12 using G3139 antisense therapy in combination with a secondary Chemotherapeutic agent.
  • FIGS. 42 depicts the cleavage of PARP as a result of inhibiting IKappab-alpha in combination with the addition of TNF at various levels.
  • FIGS. 43 (a)-(b) show the constructs in Diagrammatic Cell Language.
  • FIGS. 44(a)-(b) compare a simple notation with the Diagrammatic Cell Language.
  • FIG.45 is a flowchart of the execution of an optimization procedure of an exemplary software system according to the present invention.
  • FIG. 46 depicts schematically one process according to the invention for inferring a biological network.
  • FIG. 47 shows the network topology of the synthetic network before and after links are removed.
  • FIG. 48 displays the cost to fitting the data with one link perturbed.
  • FIG. 49-50 shows the results from the exemplary network inference methodology on a 25 node network.
  • Equation refers to a general formula of any type or description and also includes computer code and computer readable and/or executable insturctions; "Formulae” and “Equations” have their normal meanings and are used interchangably and without limitation.
  • Phenotype of a cell means the detectable traits of a cell, i.e. its physical and chemical characteristics, as influenced by its environment;
  • State of a cell means, in the aggregate, the components of the cell, the concentrations thereof and their interactions and interrelationships;
  • Cellular biochemical network means a subset of the components of the cell and their known or posited interactions and interrelationships
  • receptor a site on a cell (often on a membrane) that can combine with a specific type of molecule to alter the cell's function
  • EGF Epidermal Growth Factor
  • EGFR Epidermal Growth Factor Receptor
  • Erk refers to a kinase in the Ras Map Kinase cascade
  • “Functional output” refers to an output of a simulation which is a function of time, such as, e.g., a time series for a given biochemical; "Intrinsic to said phenotype” means causing or contributing to the phenotype;
  • Mek refers to a well known kinase in the Ras Map Kinase cascade
  • NEF Nerve Growth Factor
  • NGFR Nerve Growth Factor Receptor
  • Parameters refer to any biochemical network component (such as e.g., chemicals, protiens, genes, rate constants, initial concentrations, etc.) that can vary that can change the final output of a biochemical network;
  • “Putative target for interaction” means, broadly, any cellular component whose existence or concentration is determined, by practice of the methods of the invention, to have a significant effect on the phenotype of the cell such that when removed from the cell or reduced or increased in concentration, the phenotype may be altered. More specifically, a “putative target for interaction” means a cellular component which appears to be an actual physical or chemical target for a binding agent or reactant which will have the effect of removing the target or changing its concentration;
  • Ras ' is a kinase in the Ras Map kinase pathway
  • Ras is a small G-protein implicated in over 40% of all cancers
  • Attractors of a cell are asymptotical dynamic states of a system. These are the fixed points, limit cycles, and other stable states that the cell tends to as a result of its normal behavior, an experimental perturbation, or the onset of a disease.
  • KL3 2219704 1 "degradation" is the destruction of a molecule into its components. The breaking down of large molecules into smaller ones.
  • Ubiquitination refers to the process involving A 76-amino acid polypeptide that latches onto a cellular protein right before that protein is broken down
  • endocytosis is the process by which extracellular materials are taken up by a cell
  • signal transduction refers to the biochemical events that conduct the signal of a hormone or growth factor from the cell exterior, through the cell membrane, and into the cytoplasm. This involves a number of molecules, including receptors, proteins, and messengers
  • Gl refers to the period during interphase in the cell cycle between mitosis and the S phase (when DNA is replicated). Also known as the “decision” period of the cell, because the cell “decides” to divide when it enters the S phase. The “G” stands for gap.
  • S phase refers to the period during interphase in the cell cycle when DNA is replicated in the cell nucleus.
  • S stands for synthesis.
  • G2 refers to the period during interphase in the cell cycle between the S phase (when DNA is replicated) and mitosis (when the nucleus, then cell, divides). At this time, the cell checks the accuracy of DNA replication and prepares for mitosis.
  • the “G” stands for gap.
  • mitosis or M phase refers to the process of nuclear division in eukaryotic cells that produces two daughter cells from one mother cell, all of which are genetically identical to each other.
  • apoptosis refers to programed cell death as signalled by the nuclei in normally functioning human and animal cells when age or state of cell health and condition dictates. Cancerous cells, however, are unable to experience the normal cell transduction or apoptosis- driven natural cell death process
  • cleavage refers to the breaking of bonds between the component units of a macromolecule, for example amino acids in a polypeptide or nucleotide bases in a strand of DNA or RNA, usually by the action of enzymes oligomerization refers to the chemical process of creating oligomers from larger or smaller molecules.
  • cytochrome c is a type of cytochrome, a protein which carries electrons, that is central to the process of respiration in mitochondria (an organelle found in eukaryotes which produces energy).
  • FIG. 1 is a schematic representation of a gene expression network. It is well understood that when particular regulatory proteins and transcription factors are bound to the promoter sequence on a gene, the expression of mRNA molecules of the gene is turned 'on' or 'off. During gene expression, RNA polymerase "reads" the DNA sequence of the gene to transcribe it, i.e., to produce a specific mRNA molecule. This specific mRNA molecule is in turn decoded by a ribosome that translates the mRNA, i.e., creates a specific protein.
  • FIG. 1 shows gene 1 producing protein 1, which binds to the promoter region of gene 2, activating the expression of gene 2.
  • Gene 2 then produces protein 2, which then binds to the promoter region of gene 3, turning off the expression of gene 3.
  • gene 3 If gene 3 is active, it will produce a protein that binds to the promoter region of gene 1, which then activates the expression of gene 1.
  • This series of gene-to-protein and protein-to-gene interactions represents a gene expression network. Such networks ultimately control the overall levels of gene expression for the entire genome and consequently determine the phenotypes of the cell.
  • FIG. 2 is a schematic representation of a typical signal transduction/translation pathway.
  • Genes 4, 5, 6 and 7 produce proteins that reside in the cytoplasm of the cell.
  • Gene 4 codes for a receptor, a signal transduction protein that lies embedded in the membrane of the cell, with one part on the outside facing the environment and
  • KL3:2219704.1 the other part on the inside facing the cytoplasm.
  • Growth factors, hormones, and other extracellular signals bind to receptors and activate a cascade of biochemical reactions.
  • the proteins involved in signal transduction pathways, including receptors, are allosteric; i.e., they exist in an inactive and an active state. Biochemical reactions such as phosphorylation of a particular part of a protein or the exchange of a bound GDP molecule for a GTP molecule can change the state of an allosteric protein from inactive to active. Once activated, these proteins bind to or react with other proteins to activate them. Signal transduction pathways are thus activated in a domino-like fashion.
  • a signal at the cell surface from a receptor binding event starts a cascade of biochemical reactions and information flow which leads to the transport of a particular protein into the nucleus where it then activates transcription factors that in turn activate the expression of one or more genes.
  • Signal transduction pathways which transmit information from outside the cell to the genes, are thereby coupled to the gene expression networks that control the expression patterns of the genes and the state of a cell, i.e. its phenotype.
  • Disease State Prediction by Modeling Biochemical Networks Biochemical networks and their interactions with the environment ultimately determine the state of a cell and the development of a disease state. Deciphering the complex intertwined gene expression and signal transduction pathways involving hundreds or thousands of molecules has proven difficult.
  • KL3 2219704 1 The equation for the time rate of change of a particular protein or mRNA is comprised of terms derived from enzyme kinetics. These terms describe reactions that create, destroy, or modify the protein, i.e. phosphorylation, dephosphorylation, translation and degradation reactions.
  • the differential equations are nonlinear, but they can be solved analytically or by computer simulation to produce a plot of the concentrations of mRNA and protein as a function of time.
  • the time series of concentrations for any particular mRNA or protein can be high, intermediate, or low or can oscillate in time or even change chaotically in time, Hill et al, Proc. of Statistical Mechanics of Biocomplexity, Springer- Verlag (1999).
  • a particular time series profile corresponds to a particular state of gene expression and thus to a particular biological state. The difference between a normal and a cancerous state manifests itself at this level of description.
  • a particular time series of concentrations corresponds to a healthy cell whereas another time series of concentrations corresponds to a disease state, e.g. cancer.
  • Boolean switching networks nonlinear and piecewise linear differential equations, stochastic differential equations and stochastic Markov jump processes all provide mathematical frameworks that can represent the time evolution of mRNA and protein concentrations.
  • the kinetic equations for the mRNA and proteins involved in biochemical networks are solved in simple cases (fewer than three genes) with analytical methods from nonlinear dynamics
  • KL3 2219704 1 (bifurcation analysis, linear stability analysis, etc.) and statistical physics and probability theory (master equation, stochastic calculus, methods of stochastic averaging).
  • master equation stochastic calculus
  • stochastic averaging it is sometimes possible to extend these analytical methods and to use object-oriented computer simulations for deterministic dynamics (Runge-Kutta integration) and stochastic dynamics (Monte Carlo simulation), Gillespie, J. of Comp. Phys.
  • KL3:2219704.1 imaging for dynamically monitoring the expression and activity of biomolecules provide accurate data with which to compare mathematical predictions.
  • Imaging techniques and proteomics also provide in vivo snapshots of protein concentrations and localization.
  • rate constants are also rarely reported.
  • rate constants and other kinetic parameters lags behind the knowledge of the organization of many protein and gene networks.
  • KL3:2219704.1 one after another, often in a particular order.
  • Each of these mutations causes morphological and - physiological changes (see Vogelstein 1995).
  • cancer is often caused by a combination of cooperating oncogenes, none of which is dominant.
  • This complex combinatorial genetic origin makes genotype-to-phenotype mapping a difficult problem, and one that must be understood better before rational approaches to cancer chemotherapy can be achieved. Because these genes interact within the gene network to collectively cause transformation to a cancerous state, a mathematical description is required to identify the combination of interacting genes that can reverse or stop uncontrolled cell growth.
  • the methods of the invention can be broadly applied to find targets for therapeutic agents among the many components of the mammalian cell cycle.
  • the invention is exemplarly described below with respect to one portion of that cycle.
  • FIG. 3 depicts a portion of the signal transduction pathway and gene expression network that initiates the mammalian cell cycle.
  • the several gene and protein components of the network are identified and described below.
  • FIG. 3 was created using the using Diagrammatic Cell Language (“DCL”), a computer based graphic language which has been devised to describe all of the interactions in a cell, or within a particular biochemical network, in a single diagram, with only a few representations of each molecule. The notation is explained in detail in the article entitled "Dramatic Notation and
  • DCL is a novel means for facilitating interaction between biology and quantitative methods or applied mathematics to biology/biochemistry.
  • a biologist or other biochemical network modeler constructs a model of some cellular or subcellular network in the DCL environment using the objects available in DCL, she need not know the precise mathematical expression of these objects. Nonetheless, the DCL parser can take the constructed model and generate a precise mathematical description of the modeled biochemical network, such that it can be solved, optimized and perturbed according to the methods of the present invention.
  • Fig.3 (a) The box shown in Fig.3 (a) is used to represent a chemical which is a single indivisible chemical unit. Examples are seen at numerous locations in Fig. 3.
  • Fig.3(b) shows reversible binding between chemicals A and B.
  • C is a component which is stimulating the unbinding of A and B.
  • Fig. 3(c) indicates irreversible binding.
  • Fig. 3(d) shows a link box which is used to indicate components in the cell with complex structures, such as a protein or DNA.
  • KL3 2219704 1 link box can contain other objects, such as binding nodes (the solid black circles in Fig. 3(d)) which represent functional binding sites on a protein.
  • the link box shown in Fig. 3(d) can, for example, bind the two chemical substances A and B.
  • the unbinding symbol is shown in Fig. (1).
  • the numbers in parenthesis (RES) in the line leading from the Link Box indicate the resolution of the states of the substances in the link box. For example, the numbers shown (0,1) indicate that component A is not bound, and component B is bound.
  • Fig. 3(e) shows an internal link box L. This is used to identify a particular state similar to the resolution shown in Fig. 3(d).
  • the state of box A and B bound i.e., the dimer comprised of A and B
  • the combination of boxes in Fig. 3(f) is called a Like Box. This is used to depict which groups of objects are alike in functionality.
  • Various components within the box also can be resolved to choose a particular state.
  • the resolution indication RES on the line emerging from the large box indicates a state of 1, or component B regulating other cellular or network entities.
  • the wavy line R indicates a reversible reaction.
  • a and B are involved in a reversible reaction R to produce C.
  • E is an enzyme driving the reaction towards the product D.
  • Fig. 3(h) indicates an irreversible reaction, with a characteristic one-way arrow.
  • Figures 3(i) through 3(n) are self-explanatory, illustrating, respectively, an unbinding stimulation; a binding stimulation; an enablement; no reaction, an enhancement, and a directional stimulation.
  • the components of the network interact as follows.
  • the binding of epidermal growth factor (EGF) 301 or nerve growth factor (NGF) 302 to its respective receptor (EGFR 303, NGFR 304) results in the activation of SOS 305, 305 A, a guanine
  • KL3 2219704 1 be useful, a quantitative method needs to be capable of managing large scale, multinodal, -interconnected systems. Such a quantitative model is provided by the present invention.
  • the methods and implementations of the present invention are used to discover targets for therapeutic agents by predictions from simulation studies and analytical studies.
  • the methods are supported by mathematical techniques which infer relationships among the various components of a biochemical network.
  • the simulation and analytical studies provide accurate predictions about the behavior of the system and the identity of the targets. Optimization techniques can be used to constrain uncertainty inherent in the use of large genomics data sets. While the simulation and analytical studies are powerful given 'perfect data' as inputs to the models, 'perfect data' does not exist. Therefore, data mining techniques and bioinformatics from the analysis of large data sets of DNA sequence and expression profiles are used to provide meaningful correlations and patterns among the network components.
  • the underlying structures that data mining attempts to locate, such as markers and partitions between normal and cancerous cells, are ultimately a manifestation of the underlying dynamics of the biochemical network. Recent studies on differential gene expression reveal genes that are misregulated in disease states.
  • genes are potential targets and are used in computer models according to the present invention.
  • pattern recognition algorithms and artificial intelligence methods are used to elucidate subtle relationships among genes and proteins as well as to uncover the underlying data structure, e.g. partitions between cell types, cancer types and stages of malignancy.
  • the combination of the predictive and the inferential approaches leads to the discovery of multiple targets.
  • cell is thus intended to include biochemical networks of any type, now known or which are as yet unkonwn, some cellular, some subcellular, and some supercellular.
  • One or more components of a cell or other biochemical network may be identified as putative targets for interaction with one or more therapeutic agents by performing a method comprising the steps of: (a) specifying a cellular biochemical network believed to be intrinsic to a phenotype of said cell;
  • KL3 2219704 1 (f) comparing the first and second simulated states of the network to identify the effect of - the perturbation on the state of the network, and thereby identifying one or more components for interaction with one or more agents .
  • an additional optimization step could be performed, where the solution of the mathematical equations simulating a state of the cell is optimized to have minimum error vis-a-vis the prediction of certain available experimental data.
  • Such optimization would also comprise error analysis and extrapolation emthods. Specific methods of optimization and handling of error are described more fully below.
  • the mathematical equations representing the interrelationships between the components of the cellular biochemical network are solved using a variety of methods, including stochastic or differential equations, and/or a hybrid solution using both stochastic methods and differential equations.
  • concentrations of one or more of the several proteins or genes (generally "components") in the biochemical network are selectively perturbed to identify which ones of those proteins, genes or other components cause a change in the time course of the concentration of a protein or gene implicated in a disease state of the cell.
  • a series of perturbations are made, each of which changes the concentration of a protein, gene or other component in the network to a perturbed value.
  • the mathematical equations are then solved with stochastic or differential equations (or some hybrid thereof) to determine whether that protein or gene is implicated in causing a change in the time course and/or spatial localization of the concentration of a protein or gene implicated in a disease state of the cell.
  • the concentration of each of the proteins and genes in the network is reduced to zero in each respective perturbation.
  • NGF 302 in the network causes the cell to differentiate whereas the presence of EGF 301 in the network causes the cell to proliferate, a condition which may lead to the development of cancer.
  • concentrations of NGF 302 and EGF 301 affect the time course of the concentration of Erk 309, NGF 302 causing Erk 309 to increase with time and ultimately reach a steady-state value and EGF 301 causing the concentration of Erk 309 to initially increase and then decrease to a lower level than is present with NGF 302 in the network.
  • EGF 301 which causes the cell to proliferate
  • NGF 302 which causes the cell to differentiate
  • the system is manipulated by perturbing it to block out, inhibit, activate or overactivate one or more of the components of the network to see if that component is implicated in causing the time course of NGF 302 to increase and the time course of EGF 301 to decrease.
  • the time courses of these components are believed to be a surrogate for predicting whether the cell will proliferate or differentiate and consequently whether the cell will become cancerous or not.
  • the concentration of the several components of the network set forth in FIG. 3 were each respectively reduced to zero in a separate perturbation and the time course of the concentrations of NGF and EGF were determined from calculations of the concentrations of cellular components according to the simulation. It was found that when component PI3K was "knocked out", i.e., when its concentration was reduced to zero, this caused the time course of the concentration of NGF to increase and the time course of the concentration of EGF to decrease, indicating a beneficial result in that the cell was, according to this surrogate analysis, caused to
  • one or more components of a cell may be identified as putative targets for interaction with one or more therapeutic agents by performing a method comprising the steps of:
  • the object is not to find numerical values of the several components of the biochemical network, as described above in performing
  • KL3:2219704.1 the methods of the invention in a simulated network.
  • instabilities - and transitions of a cell state are identified and a bifurcation analysis is conducted in order to analyze the stability of the cell and determine the probability of its transformation into a different state, e.g. a disease state.
  • a bifurcation analysis is depicted in Fig. 9, which is a bifurcation diagram plotting various cell phenotypes as a function of two rate constants (Rate Constant 1 ("RC1") and Rate Constant 2 ("RC2”) in Fig. 9).
  • RC1 Rate Constant 1
  • RC2 Rate Constant 2
  • Attractors of the system are identified. These attractors are the equilibrium states of the cell. Attractors may be steady state equilibria, periodically changing equilibria or chaotic equilibria with certain peculiar signatures, of the network. The equilibria may represent normal, disease, growth, apoptosis or other states of the cell, as depicted n the example of Fig. 9. Once these attractors have been identified, it is possible to perturb the network to determine conditions under which the cell may be transformed from one state to another.
  • KL3 2219704 1 deactivate the cell into a stable state. All of these dynamics lead to the identification of drug targets as well as information of value with respect to the duration of a drug treatment that may be needed.
  • the analytical embodiments of the invention use root findings and continuation algorithms to find bifurcations rather than conducting repeated simulations as in the simulation embodiments. Understanding which parameters and which proteins and genes are important in causing or reversing a cancerous state may lead to the identification of multiple-site drug targets. It is also possible using the methods of the invention to evolve the biochemical network. By elucidating the connections between the components of the network and their functional dynamics, the methods of the invention lead to a prediction of the changes which may give rise to disease or which may cause a disease state to transition into a normal state. It is also possible to find and "evolve" states that are more stable than the starting state of the network. These evolved states can be experimentally checked to see if the biology is accurately described by the predicted state of the network.
  • Cancer cells are frequently subject to high mutation rates and a treatment that is predicted by the method of the invention and that is robust in the face of evolved changes in the network will be more desirable than treatments leading to a changed state that is easily sidestepped by minor evolutionary changes in the cell.
  • the concentrations of one or more of the several proteins and genes in the biochemical network are selectively perturbed to identify which ones of those proteins or genes are implicated in causing an attractor of the biochemical network to become unstable.
  • a series of perturbations are made which change the concentration of a protein or gene in the network to a perturbed value.
  • KL3 2219704 1 mathematical equations representing the interrelationships between the components of the network then leads to a determination of whether the perturbed protein or gene is implicated in causing a change in the time course of the concentration of a protein or gene implicated in a disease state of the cell.
  • concentrations of each of the proteins and genes is reduced to zero in each respective perturbation and the mathematical equations are then solved.
  • a bifurcation analysis is performed using eigenvalues of a Jacobian matrix based upon the equations describing the interrelationship of network components to characterize the stability of one or more attractors.
  • the literature provides sources for identification of biochemical networks intrinsic to disease studies. These networks include signal transduction pathways governing the cell cycle; transcription, translation, and transport processes governing the cell cycle; protein-gene and protein-protein interactions, such as, e.g., Kohn maps; protein-protein interactions found through genome data mining techniques; protein-protein interactions from genome- ide yeast-two hybrid methods; trans-acting regulatory proteins or transcription factors and cis-acting binding motifs found through experimental and computational genome- wide search methods; protein- gene and protein-protein interactions inferred through the use of microarray and proteomics data; protein-protein interactions and protein function found from 3 -dimensional protein structure information on a genome-wide scale; binding partners and functions for novel uncharacterized human genes found through sequence homology search methods; and protein-protein interactions found from binding motifs in the gene sequence.
  • FIG. 4 describes the Wnt/Beta-Catenin signaling pathway that plays a critical role in the progression of colon cancer cells through the cell cycle.
  • Fig. 4 uses a conventional depiction, and comes from Science Magazine, Signal Transduction Knowledge Environment web page at http://stke.sciencemag.org/cgi/cm/CMP_5533.
  • the various letters labelling the genes and proteins in FIG. 4 indicate the location of the molecules specified in the legend with reference to the cell, where "E” means extracellular, "P” means Plasma membrane, “C” means Cytosol, "N” means Nucleus, and "O” means Other Organelle.
  • the symbols +, -, and 0 indicate the type of interaction between the molecules. A detailed explanation of this pathway is found at the website listed above.
  • Ras-Map Kinase pathway (2) Wnt f3-Catenin, (3) Gl-S transition, (4) Rho-family G proteins (cdc42, etc.), (5) JNK pathway, (6) Apoptosis (Caspases, p53), (7) G2-M transition, (8) Integrin pathway, (9) P13 Kinase pathway, (10) c-Myc pathway, (11) Telomeres, (12) Nuclear Receptors, and (13) Calcium Oscillations.
  • KL3:2219704.1 Project private sources and experimental data.
  • data on the localization of mRNA and proteins, and the structure and function of molecules may be used.
  • the equations quantitatively describe the time rate of change of gene products (mRNA, inactive protein, and active protein) that comprise the biochemical network.
  • Each term in such differential equations represents a particular reaction in the biochemical network.
  • a particular reaction in the biochemical network is represented in FIG. 3 by an arrow connecting two or more biomolecules.
  • KL3:2219704.1 functions of the concentration of inactive substrate: the Michaelis-Menten forms.
  • the differential equation for RasGTP contains a term describing the SOS catalyzed conversion of RasGDP to RasGTP. This first term indicates that the rate of creation of RasGTP is equal to the product of a rate constant, Kphos, the concentration of active SOS and the nonlinear function of the inactive substrate concentration RasGDP.
  • the nonlinear activation of RasGTP causes the rate of activation to saturate at high levels of inactive substrate.
  • RasGTP is rapidly produced when there is a high concentration of active SOS, is slowly produced when there is little active SOS, and is not produced at all when either active SOS or RasGDP concentrations are zero.
  • a similar term describes the enzyme-catalyzed deactivation of RasGTP by RasGTP.
  • the kinetic form for each reaction often varies with reaction type.
  • the term describing the binding of free promoter to protein is the product of the binding rate, the free promoter concentration, and the protein concentration.
  • Mathematical frameworks other than nonlinear differential equations can be used to describe the dynamics of biochemical networks. When certain assumptions are made about the enzyme catalyzed Michaelis-Menten reaction form, the nonlinear term becomes piece- wise linear or 'switch-like' and is more amenable to mathematical analysis. Kauffman 1969, Kauffman and Glass, 1972, Glass 1975.
  • the nonlinear differential equations are approximations of the more realistic stochastic reaction framework.
  • the stochastic time evolution of this system is a Markov jump process where the occurrence of each chemical reaction changes the concentration of chemicals in discrete jumps as time moves forward. When certain criteria are satisfied, an intermediate
  • the mathematical models can be implemented with existing software packages.
  • Basic information about the network, about the reactions which occur in the network and the mathematical frameworks which describe these reactions are input to the programs.
  • the network information includes a list of chemicals and their initial concentrations, a list of rate constants and their values and a list of the reactions which take place in the network, i.e. the reaction, the components and other chemicals involved in the reaction, and the kinetic parameters involved in the reaction.
  • the list of reactions links up the components and other chemicals present in the network to form the topology of the network.
  • reaction information which is input includes that pertaining to reactions such as phosphorylation, dephosphorylation, guanine nucleotide exchange, transport across the nuclear membrane, transcription, translation and receptor binding.
  • reactions such as phosphorylation, dephosphorylation, guanine nucleotide exchange, transport across the nuclear membrane, transcription, translation and receptor binding.
  • Each specific reaction includes the chemicals that are involved in the reaction, the stoichiometry of the reaction, i.e. the number of molecules created or destroyed in the reaction, and the rate of the reaction. The rate depends on the rate constants and the concentration of the chemicals involved.
  • reaction movers are those that can be used to evolve the state of the system forward in time. They consist of differential equation dynamics and stochastic dynamics movers. In the stochastic dynamics derived class, the occurrence of a particular reaction is calculated in accordance with the reaction rates entered. The concentrations of the components are then changed to reflect the occurrence of the reaction. The rates of each of the reactions is then recomputed. Such a probabilistic time evolution of the biochemical
  • KL3 2219704 1 network is known as a continuous time Monte Carlo simulation.
  • Gillespie algorithm Gillespie 1976.
  • the differential equation derived class the differential equation that describes the time rate of change for each component is constructed from the kinetic forms and stoichiometry entered.
  • Various numerical integration routines e.g. Runge-Kutta, are used to solve for the new chemical concentrations as time moves forward.
  • Both the stochastic and nonlinear differential equation frameworks output the concentration of all of the components in the network as a function of time.
  • optimization routines can be used to fit the rate constants.
  • the values of the rate constants are often not known and these optimization methods can thus be used to make the model and the simulation of the model more accurate.
  • the system is first simulated at a particular set of values for the rate constants.
  • the resulting simulated time series for a particular component is compared to an experimentally measured concentration time series and a 'penalty' or 'cost' is calculated as the sum of the squares of the differences between the data and the simulated time series.
  • the rate constants are then perturbed away from the starting values and the simulation is repeated and the cost recalculated.
  • the optimizer adopts the new set of rate constants that resulted in the lower 'cost' and a better fit to the data.
  • the perturbing or changing of the rate constants is sometimes performed randomly and sometimes performed rationally, depending on the optimization routine.
  • the optimizer iterates the changing of the rate constants, simulating the network, and evaluating the change in the 'cost' until the simulation nearly matches the data.
  • a measure of the predictive power of the mathematical model is the robustness of the predictions obtained from the simulation with optimized parameter values.
  • KU 2219704 1 accomplish this, known as stochastic sensitivity analyses, are used.
  • stochastic sensitivity analyses are used.
  • the parameter values are stochastically perturbed in the vicinity of the minimum cost, and the system is simulated with this ensemble of rate constant sets. If the output from the simulation does not vary significantly within the ensemble of rate constant sets, the prediction is robust and the predictive power of the model is high. On the other hand, if the simulation varies significantly within the ensemble of rate constant sets, the prediction is not robust and the predictive power of the model is low.
  • FIG. 5 is a DCL schematic representation of a network comprising two genes, GA and G B .
  • G A and G B axe transcribed independently from two separate promoters, P and P B , to produce mRNA A and mRNA B, respectively, which are then translated to produce proteins A and B, respectively. Transcription and translation are approximated as a single process. Protein A inhibits the production of B. Proteins A and B together activate the production of A. This is only physically plausible if the operator DNA sequences in promoters P and P B are similar. ptot ⁇ l ptot ⁇ l
  • a and B represent the total number of promoter copies for genes A and B and is equal to one for a single copy of the gene circuit.
  • Promoter A 7 P A controls the production of protein ⁇ from gene A 7 G A .
  • Promoter B 7 P controls production of protein B from gene B.
  • Protein A represses production of protein B (indicated by -) while protein A and protein B together activate the production of protein A (indicated by +).
  • a deterministic model is established by deriving a set of coupled nonlinear differential equations ⁇ j- where index i labels a chemical species in the network, a chemical species being a particular gene product, mRNA A, mRNA B, protein A or protein B, or part of the gene itself, promoter A, P A and the complexes that can be formed as a result of allowed biochemical reactions [PA : A].
  • the following differential equations for this system were integrated with a fourth-order Runge-Kutta routine obtained from Numerical Recipes (Press et. al, Numerical Recipes in C 1992) .
  • the invention may be used to predict the behavior of the network as represented by these ten differential equations.
  • the equations are solved analytically
  • KL322197041 i.e. mathematically
  • the equations are solved on a computer by numerically integrating them and thereby simulating the network's behavior as a function of time.
  • k u ⁇ (k u ⁇ ) and k b ⁇ (k b ⁇ ) are the unbinding and binding constants, respectively.
  • the system is non-dimensionalized by first dividing by k d so that time is now rescaled by k ⁇ , i.e. t ⁇ k ⁇ jjt.
  • This condition changes when the eigenvalues of this Jacobian matrix become positive, indicating that fixed point 1 is now unstable and fixed point 2 is not stable. If fixed point 1 is the normal, healthy state of a cell and fixed point 2 is a cancerous state of a cell, this stability condition identifies which combination of parameters, and thus which combination of genes, are most important in causing the transition or bifurcation from the normal state to the cancer state. This in turn identifies the genes which are putative targets for therapeutic agents for treatment of the disease controlled by this particular network.
  • the ratio of transcription rates to degradation rates ( ⁇ 's) is normally greater than unity and the ratio of unbinding constants to binding constants ( ⁇ 's) is typically much less than unity so that fixed point 1 is a stable node for typical parameter values.
  • the analyses that follow take place in this parameter range.
  • the phase portrait in FIG. 6 indicates that the system will flow to fixed point 1 given any non-zero initial values of protein A.
  • the system flows to fixed point 2 only in the absence of protein A.
  • the kinetic parameters used are set forth in Table 1.
  • Table 1 sets forth the parameters used in the differential equation model of Figs.6 and 7. In Figs.7 and 8, both the stochastic and differential equation systems are initialized with 50 protein A molecules and zero protein B molecules.
  • this numerical integration simulate the behavior of the two gene network and results in the plots shown in FIGS. 6 through 8.
  • the time course of these plots corresponds to a particular biological state.
  • one particular time course can correspond to the normal progression of a cell through the cell cycle while another time series can correspond to the unregulated cell growth characteristic of cancer.
  • the time series of a cancerous cell can be changed into the time series of a healthy cell, thereby identifying sets of genes and proteins as putative targets for therapeutic agents.
  • FIGS. 7 and 8 compare the output of the stochastic model and the differential equation model for the two gene network.
  • FIG. 7 displays the time evolution of the differential equation model for a single copy of the gene circuit. The system quickly flows to the stable fixed point 1 as expected. In the differential equation model, the system would never reach the "extinction" fixed point unless the system started with zero A molecules.
  • FIG. 10 contains a graphical representation of the Wnt ⁇ -catenin pathway indicating the role of Axin, APC, and GSK3 in phosphorylating ⁇ -catenin and leading to its degradation.
  • FIG. 10 was created as well using Diagrammatic Cell Language, which was discussed above in connection with FIG. 3.
  • CM broad horizontal lines
  • NM broad horizontal lines
  • Wnt signaling is induced by secreted Wnt proteins that bind to a class of seven-pass transmembrane receptors encoded by the frizzled genes. Activation of
  • KU 2219704 1 the frizzled receptor leads to the phosphorylation of disheveled (Dsh) through an unknown mechanism.
  • Activated disheveled inhibits the phosphorylation of ⁇ -catenin by glycogen synthase kinase 3 ⁇ (GSK3 ⁇ ).
  • Unphosphorylated ⁇ -catenin escapes detection by ⁇ -TrCP which triggers the ubiquitination of ⁇ -catenin and its degradation in the proteasomes.
  • Stabilized ⁇ -catenin enters the nucleus where it interacts with TCF/LEFl transcription factors leading to the transcription of Wnt target genes such as CyclinDl and c-Myc.
  • ⁇ -catenin phosphorylation by GSK3 ⁇ occurs in a multiprotein complex containing the scaffolding protein Axin, as well as GSK3 ⁇ and the APC tumor suppressor.
  • ⁇ -catenin is efficiently phosphorylated and then is earmarked for degradation by ⁇ -TrCP.
  • Stabilized ⁇ -catenin is common to most colon cancers, where mutations in APC, Axin, and ⁇ -catenin itself are known to interfere with its effective ubiquitination and consequently its degradation.
  • ⁇ -catenin Accumulation of ⁇ -catenin leads to the activation of the Wnt target genes such as CyclinDl and c-Myc, both of which are intimately involved in cell cycle control and the progression of cancer.
  • Nuclear ⁇ -catenin also targets ⁇ - TrCP increasing its levels and creating a negative feedback loop in the system.
  • FIG. 10 depicts a subnetwork that represents the components involved in Wnt signaling in addition to side pathways responsible for ⁇ -catenin degradation. These include the Axin degradation machinery and ⁇ -catenin transcription of target genes c-Myc and ⁇ -TrCP.
  • the representation includes notations depicting all of the components, chemical forms of the
  • each chemical species is or may be involved in a reaction.
  • the time course of its quantity or concentration is simulated.
  • the components include: Axin, ⁇ -catenin, APC, GSK3, ⁇ -TrCP, HDAC, Groucho, c- Myc gene, c-Myc mRNA, ⁇ -TrCP gene, ⁇ -TrCP mRNA, and an unknown intermediary protein that facilitates the enhancement of ⁇ -TrCP by nuclear ⁇ -catenin.
  • Each of these components can exist in an alternate form depending on the species with which it interacts.
  • ⁇ - catenin can be phosphorylated directly by GSK3 forming ⁇ -catenin phosphorylated. It can also bind to Axin to form a ⁇ -catenin: Axin complex.
  • Table 2 There are a total of 70 components and chemical species in this exemplary simulation. They are listed in Table 2 below.
  • each reaction represents a probability of a reaction occurring.
  • each reaction represents a term in the differential equation representing the time rate of change of the chemical species. The list of differential equations is set forth in Table 4 below.
  • the initial values of the kinetic parameters are chosen from the literature by incorporating time scale and expression information. For example, it is known that GSK3 phosphorylates Axin on a time scale of about 30 minutes, and hence a rate constant is chosen to reflect that time scale.
  • Perturbations can be introduced to determine the relevant targets in the network and the effect they have on perturbing the network.
  • the binding rate of Axin to APC can be set to zero, thereby simulating the effects of a mutation in APC that prevents its binding to Axin.
  • FIG. 12 depicts a time series profile which characterizes the "cancerous" state, for example, where a mutation in APC prevents Axin from effectively degrading ⁇ -catenin. This results in the up-regulation of c-Myc as well as higher levels of ⁇ -TrCP. In this case, the level of ⁇ -catenin rises in the cytoplasm and the nucleus and consequently c-Myc transcription rises as well.
  • One or more components of a cell can be identified as putative targets for interaction with one or more agents within the simulation. This is achieved by perturbing the simulated network by deleting one or more components thereof, changing the concentration of one or more components thereof or modifying one or more of the mathematical equations representing interrelationships between two or more of said components. Alternatively, the concentrations of one or more of the several proteins and genes in the biochemical network are selectively perturbed to identify which ones of said proteins or genes cause a change in the time course of the concentration of a gene or protein implicated in a disease state of the cell. Deleting One or More Components in the Network
  • the APC protein is deleted by removing the protein from the set of equations in Example II and removing all of the chemical species formed and reactions that take place as a result of interactions with APC.
  • the effect on the state of the cell is to raise the levels of ⁇ -catenin so that it continually activates downstream targets such as ⁇ -TrCP and c-Myc. This can be seen by comparing FIG. 13 with FIG. 11.
  • APC is thus identified as an important component of the cell which is implanted in a disease state of the cell. When APC is "knocked" out it leads to a high level of ⁇ -catenin which can cause the development and progression of colon cancer. In the event that a mutation in APC prevents its interaction with the components in the cell, therapeutics can be sought to rectify this condition.
  • HDAC a protein that sequesters nuclear ⁇ -catenin and represses the c-Myc gene, is deleted by removing HDAC from the set of equations set forth in Example II and thereby removing all of the chemical species that are formed and reactions that take place as a result of
  • HDAC HDAC is thus identified as an important target because when it is "knocked” out, high levels of c-Myc develop and this can lead to the development and progression of colon cancer. In the event that a mutation in HDAC prevents its interaction with the components in the cell, therapeutics can be sought to rectify this condition. Changing the Concentrations of One or More the Components in the Network
  • the concentration of Axin is increased significantly above its normal levels. This results in the reduction of ⁇ -catenin levels. This is shown in FIG. 15. This identifies a cellular component, Axin, that can cause a reduction in the time course of the concentration of ⁇ -catenin.
  • HDAC histone deacetylase
  • the levels of HDAC are then increased in the simulated cell to which Axin has been introduced. This lowers the concentrations of nuclear ⁇ -catenin and c-Myc. Both levels are lowered recreating a profile that corresponds to a "normal" cellular state. This is shown in FIG. 16.
  • FIG. 16 This identifies HDAC as an important component for control of a disease state. Starting from the disease state of FIG. 12, the concentrations of Axin and GSK3 are perturbed by increasing their levels significantly above normal. The concentration of Axin is perturbed less than above so that ⁇ -catenin levels fall, but not as much. GSK3 concentration is then increased to further reduce ⁇ -catenin levels. The levels of ⁇ -catenin approach that of the normal state. This is shown in FIG. 17. This identifies Axin and GSK3 as two components which affect the time course of ⁇ -catenin.
  • a further series of perturbations are made, each of the perturbations changing the concentration of a protein or gene in the network to a perturbed value to determine whether that protein or gene is implicated in causing a change in the time course of the concentration of a
  • KU:2219704.1 gene or protein implicated in a disease state of the cell Starting from the disease state of FIG. 12, the concentrations of Axin and GSK3 are perturbed by increasing their levels significantly above normal. This is shown in FIGS. 18 and 19, respectively. The system is perturbed again by raising the levels of HDAC. This is shown in FIG. 20. Upon each perturbation, the levels of ⁇ - catenin are reduced in the cytoplasm and then in the nucleus resulting in reduced levels of c-Myc mRNA. This identifies Axin, GSK3, and HDAC in varying degrees, as causing a change in the time course of the concentration of the gene or protein implicated in the disease state.
  • Example II The mathematical equations in Example II are modified by adding a new component to facilitate the binding of Axin to ⁇ -catenin.
  • the reaction term that represents the binding of Axin to ⁇ -catenin is changed from
  • KL3 2219704 1 identifies the Facilitator as an important putative therapeutic for changing the binding of Axin to ⁇ -catenin and thereby changing the condition of the cell from the disease state to the healthy state.
  • the binding rate of Axin to ⁇ -catenin is increased. This perturbation allows Axin to bind to ⁇ -catenin more quickly and thus enable its phosphorylation and degradation without APC. This is shown in FIG. 24. This identifies Axin as a component of the cell that changes the time course of the disease state.
  • the parameter for the binding of Axin to GSK3 is set to zero and then the parameter for the unbinding of ⁇ -catenin to the c-Myc gene is set to zero.
  • the first perturbation causes a significant increase in cytoplasmic and nuclear ⁇ -catenin.
  • the second perturbation increases c-Myc mRNA levels immediately. This is shown in FIGS. 28 and 29. This identifies GSK3 and nuclear ⁇ -catenin with the c-Myc gene as important targets for affecting the time series profile of the disease state.
  • the parameters in the system are systematically perturbed until the profile of the time series expression looks "normal.”
  • the simulation is run at the starting kinetic and concentration values of the disease state and then a computer code is executed to systematically vary one or more kinetic parameter and
  • K 3 2219704 t concentration value from 0 to some maximum number until the desired time series profile is reached.
  • a criterion is introduced to cease the perturbation when the time series matches the desired output, in this case, that of a "normal" profile similar to FIG. 11.
  • the systematic changes are shown in FIG. 31 and consist of a final change where the binding rate of B-catenin to Axin is increased, the binding rate of B-catenin to c-Myc TCF bound gene is decreased, and the binding rate of GSK3 to Axin is increased.
  • FIG. 31 shows the modular description whereby modular we mean a simplification of the model into basic elements to clearly see gross connections between components, major feedback loops, and cross talk between the modules. Each module contains many reaction steps. The lines extending from the modules indicate interactions between the modules.
  • Exhibit A contains the differential equations for each module, the chemical species or states in each module, as well as a list of the initial concentrations and kinetic parameters used in the simulation.
  • the eight modules comprising Figure 32 are, for convenience, designated with a quadrant and a right/left deisgnation in order to easily orient a given individual module with the overall modular description of Fig. 32. As well, the interconnections between the various modules.
  • KU 2219704 1 modules are noted on each module, and are summarized in the following table, where for convenience they are designated as "lines”, actually referring to biological interconnections.
  • Each interconnecting line is numbered in a clockwise manner depending on the point where the line exits a particular octant in the large diagram.
  • the Ras MAPK module of Figure 32(a) contains important growth factors such as EGF, TGF-alpha, and amphiregulin that activate the Erb family of receptors. Once activated these receptors activate a cascade of proteins called Ras, Mek, and Erk. Active Erk plays an important role in turning on genes that are responsible the mammalian cell cycle, genes such as cyclinD which initiate the transition from Gl to S phase. Erk even intiates further cellular division by turning on the same growth factors that lead to its activation in an autocrine manner. Often times, cancer will have a mutations in Ras that inhibits its
  • KL3:2219704.1 hydrolysis or conversion from its active GTP bound form to its inactive GDP bound form, thereby promoting activation of Erk and proliferation.
  • the Ras MAPK module also interacts with the PI3K/AKT module of Figure 32(b).
  • the same growth factors that lead to Erk activation can also activate the survival factor AKT (sometimes referred to as PKB).
  • AKT can also be activated by a set of growth factors known as Insulin, IGF-1, and IGF-2 as shown in the module description. Once activated by growth factor signals, AKT induces the nuclear translocation of NFkappaB.
  • NFkappaB turns on a set of genes that up-regulate proteins labeled as survival factors such as Bcl2, Bcl-xL, Flips, and IAPs shown in the apoptosis module of Figure 32(g). These proteins inhibit apoptosis or programmed cell death on many levels. Thus a healthy dividing cell will signal to promote cellular division and to inhibit apoptosis via upregulation of these survival factors.
  • the Wnt beta-catenin module of Figure 32(c) is another module that signals to promote cellular division.
  • This pathway usually contains a mutation in Beta-catenin that leads to high levels of the protein in the cell. Normally, the pathway is inactive and Beta-catenin levels are low unless the cell is stimulated with the Wnt ligand.
  • Beta-catenin also acts as a transcription factor turning on CylinD and c-Myc both of which lead to the Gl-S transition. Excess levels can thus lead to proliferation and uncontrolled cellular division.
  • cancer cells will inhibit the process of differentiation.
  • a colon cancer cell achieves this by mutating the SMADS in the TGF-beta pathway of Figure 32(d). These are again transcription factors activated by the TGF-beta ligand that promote the transcription of genes like p21 and pi 6 which halt the cell cycle signaling that it is time for the cell to differentiate.
  • K 3.2219704 1 activated by growth factors to in turn activate CylinE-CDK2 complexes which initiates DNA synthesis or S phase.
  • CylinA-CDK2 complexes are activates to induce chromosomal replication.
  • the completion of this process is marked by activation of CylinB-CDKl complexes that induce the onset of M phase or mitosis terminating in the cell successfully replicating its DNA and dividing into two daughter cells.
  • the p53 transcription factor of Figure 32(h) is activated to up-regulate genes that halt the cell cycle and or genes that promote apoptosis (e.g. Bax and Bad) should the cell not correct its defects in time.
  • p53 is a very commonly mutated gene in colon cancer and thus rather than repairing its DNA or undergoing programmed cell death when the DNA is damaged or mistakenly replicated, the cell can continue to divide.
  • Other signals that effect the state of the cell are received via the JAK/STAT pathway where cytokines activate components like p38 and JNK shown as JAK/STAT in Figure 32(e). These again, can up- regulate transcription factors that promote cellular death (amongst other signals that even compete with apoptosis) via up-regualtion of death inducing cytokines such as FASL when the cell is stressed.
  • apoptosis signals converge onto Figure 32(g).
  • apoptosis can be trigerred via activation of death receptors such as FASR and TNFR through the ligands TNF- alpha and FASL.
  • These receptors activate caspase 8 leading to cleavage of Bid which induces the oligomerization of Bad.
  • Bad can disrupt the mitochondrial integrity releasing cytocrome c and activating caspase 9 which in turn activates the executioner caspase 3.
  • the executioner caspases cleave various proteins in the cell and induce programmed cell death.
  • Caspase 8 can also directly activate caspase 3.
  • Many colon cells require both direct caspase 3 activation and
  • the model By scaling the model to include the networks responsible for the physiological process of the Gl-S transition, S phase, G2-M transition, and apoptosis we can use the model to predict the various physiological states of the cell.
  • Therapeutic strategies can be suggested for each stage of the disease.
  • mutations specific to an individual can be inputted devise individual targeted therapies. Data from that individual on the DNA level, RNA level, and protein level can be incorporated and optimized to this core skeletal model to generate an optimal therapeutic strategy.
  • the network is simulated as a whole. All of the differential equations are solved simultaneously. One can perturb any of the cellular components in the simulation to predict cellular outcome and understand the cross talk and feedback loops between the various modules.
  • cellular mechanisms are represented on multiple levels, including receptor activation, degradation, endocytosis, signal transduction cascades, transport within compartments, transcriptional control of gene expression networks, and protein translation and degradation mechanisms. Optimization of the Model: Determining and Constraining Parameter Values
  • KL3 2219704 1 are incorporated in software, e.g. the Implementation software which simulates HCT116, SW480, and Caco-2 colorectal cell lines.
  • Figs. 33(a)-(d) showtime course profiles of HCT116 cells under 20ng of EGF stimulation.
  • the figures plot phosophorylated MEK, ErK, AKT, and RAF and simulation data that has been optimized to fit the model.
  • the solid lines show the simulation output and the dots are the measured data from Caco-2 cells stimulated with 20 ng EGF.
  • Rate constan values can be gathered from literature sources that have measured their values or by estimating their value from what is known in the literature or otherwise on the activation or deactivation of a particular component or analgious biological component if that information is not available. These starting values may produce simulation outputs that do not necessarily match up with experimental time course measurements of the expression levels of the actual components.
  • the expression levels are the total protein levels, levels of modified forms of the protein (e.g. phosophorylated, cleaved,), and or RNA levels using a multitude of experimental methods.
  • Figures 33(a)-(d) contain the data points for the phosphoryatled forms of AKT, MEK, and ERK. These data points are fed into the simulation and the resulting simulated time series for a particular chemical is compared to an experimentally measured concentration time series. A 'penalty' or 'cost' is calculated as the square of the difference between the data and the
  • Each simulation data point is computed with the conditions specific to that experiment. For example, an EGF experiment with five different levels of EGF would be simulated under each of those conditions and similarly for other treatments and conditions. The goal is to find the parameter values that minimize the overall global cost function
  • the rate constants are perturbed away from the starting values and the simulation is repeated and the cost recalculated. If the cost is lower, the new set of rate constants that gave the lower ' cost' and a better fit to the data are taken. Perturbing or changing of the rate constants may be carried out almost randomly or more
  • the model predicts physiological outcomes such as proliferation, Gl-S, G2-M, S phase arrest, and apoptosis as indicated by molecular markers within the simulation.
  • CyclinE-CDK2 levels are an indicator of the Gl-S state
  • CylinA-CDK2 levels are an indicator for the S phase state
  • CylinB-CDKl levels are an indicator for the G2-M state
  • caspase 3 and cleaved PARP (a protein that gets cleaved by executioner caspases such as caspase 3) are indicators for apoptosis.
  • the model is used to simulate the "cancer" state indicated by high levels of proliferative signals such as Erk and high levels of pro-apoptotic proteins such as Bcl2.
  • Targets can be perturbed to see if a particular physiological state can be induced. After perturbing the target one can predict whether the cells will go through Gl-S arrest, G2-M arrest, S phase arrest and apoptosis.
  • normal cells express a certain number
  • K 3 2219704 1 of anti-apoptotic proteins are analogous to actively applying the brakes in a car at the top of a hill to prevent it from rolling down, or, in the case of the cell, going into apoptosis. If the brakes are released the car will not move forward unless another force is applied, but without the brakes it is much easier to send the car down the bill. Similarly, in a sensitized state, the cell is more likely to go into apoptosis when another perturbation that is pro-apoptotic is applied than when it is not in a sensitized apoptotic state.
  • Fig. 34 shows the results of perturbing 41 individual targets in the model where the final outcome is the cellular physiological state of the cell.
  • 41 targets were perturbed in the simulation of the cancer cell.
  • a perturbation was applied singly either on the protein or RNA level such that the final outcome was up or down regulation of the target on the protein level. This perturbation can be accomplished systematically or automatically via a computer algorithm that systematically perturbs each component and then checks to see what state the cell is in.
  • Fig. 34 Most of the perturbations shown in Fig. 34 lead to a more sensitized apoptotic state. This is characteristic of a lot of cancer therapeutics which have a single target as a component of the diseased cell. Other putative therapeutics can be found by determining the effect of their action on another node of intervention. Within the simulation one can knock out a combination of targets and thereby identify which combination, when knocked out, is more likely to promote apoptosis.
  • Fig. 35 lists combinations of targets that were identified by the simulation which when knocked out caused apoptosis in a colon cancer cell. Surprisingly, it was found that many targets when inhibited singly lead to sensitization towards apoptosis or weak induction of apoptosis, but not apoptosis or a strong induction of apoptosis. The combinations of targets synergistically give rise to apoptosis in a cancer cell. For example, when one inhibits Bcl2 or
  • KU 2219704 I Bcl-xl in combination with CDK1 one can predict apoptosis in the cancer cell. In contrast knocking out these targets singly results in little or no caspase 3 activation.
  • the mechanism for this may be that inhibiting CDK1 in cells that are quickly dividing leads to high levels of free CyclinB. This can sequester 1433 -sigma away from the pro-apoptotic proteins Bax and Bad, freeing them up to target the mitochondria. This effect can further be enhanced by inhibiting Bcl2 or Bcl-xL in combination. High levels of Bax and Bad can then promote apoptosis via breakdown of the mitochondrial integrity.
  • Fig. 36 shows the mechanism of action of this perturbation as described in the colon cell simulation. It is noted that Fig. 36 depicts a portion of Figs. 33(f) and 33(g).
  • CDK1 can be inhibited leading to induction of free CyclinB. CyclinB can then bind 1433-sigma and sequester it away from Bad, Bax, and other pro-apoptotic Bel family members. Identifying Key Nodes of Intervention from the Simulation
  • the simulation is used to understand the conditions under which oncogenic Ras leads to sustained levels of phosphoyrlated Erk. Based on this analysis one can locate key places in the network that drive sustained Erk levels and this knowledge can then be used to identify new targets for therapeutic intervention.
  • Ras leads to reduced hydrolysis rate such that GAP is unable to efficiently convert GTP bound Ras to GDP bound RAS.
  • This mutation was explicitly put in the simulation by reducing the parameter value controlling Ras hydrolysis rates.
  • a cancerous state is characterized by the cell's ability to create sustained Erk levels with small levels of growth factors.
  • the system was simulated under conditions where low levels of growth factors were added. It was found that the only way one could attain sustained levels of phosphorylated Erk, was where there was autocrine signaling in the system, i.e. a feedback loop wherein Erk can
  • KU 2219704 1 lead to the transcription of growth factors that bind to the EGF receptor and further stimulate the network in an autocrine manner. Thus another node or region where therapeutic intervention can be helpful was discovered.
  • Fig. 37 through 38 show the simulation output of this study.
  • the upper graph depicts Erk stimulation in the normal cell where it is stimulated transiently.
  • the lower graph above shows sustained Erk levels arising as a result of having both an oncogenic Ras and autocrine signaling. Testing Drugs/Compounds within the Model A particular cancer state in which several nodes are mutated has been simulated.
  • the mutations in the model were inputted by: (1) deleting beta-catenin Axin interaction so that high levels of beta-catenin accumulate in the nucleus and transcription of cell cycle target genes ensues; (2) mutating the Ras-Map kinase pathway by reducing the kinetic parameter representing the GTP hydrolysis rate of Ras and thus promoting high levels of erk and survival signals; and (3) increasing the expression of bcl2 leading to higher levels of the anti-apoptotic protein in the cell as is found in many colorectal cancers.
  • G3139 An anti-sense bcl2 inhibitor G3139 (e.g. G3139, Benimetskaya et al. 2001) currently in clinical trials was tested within the model to determine whether it had any efficacy against the cancer state. G3139 has been shown to reduce the total levels of Bcl2 over 24 hours. Using the simulation it was found that the cells become sensitized to apoptosis, but most do not undergo apoptosis. The sensitivity to apoptosis depends on the level and activity of pro- apoptotic proteins such as Bax, Bik , and Bok. Cancers that have mutations in these proteins are less sensitive to G3139 therapy.
  • pro- apoptotic proteins such as Bax, Bik , and Bok.
  • cytokines such as TNF-alpha and FASL. These cytokines will further induce autocrine and paracrine signaling that will promote apoptosis in the cancer cell and surrounding cells.
  • the cells become even more highly sensitized when non-specific effects of G3139 were simulated, e.g. its ability to inhibit XIAP and Bcl-xL mRNA translation
  • a secondary agent that leads to up- regulation of pro-apoptotic proteins or cytokines.
  • G3139 in and of itself has low toxicity. Without being bound by theory, one can predict that the toxicity induced with the secondary agent depends on the specificity of the secondary agent to cancer cells. The more specific, the less toxic the combination therapy will be.
  • Figs. 39-41 shows the Inhibition of Be 12 using G3139 antisense therapy.
  • This simulation illustrates how one skilled in the art can use the model to simulate the treatment of a patient with a specific stage of cancer by using one or more specific agents that target key proteins controlling cell cycle progression and apoptosis.
  • the methods of invention demonstrate how one can use the model to determine the efficacy of an agent against cancer with a certain mutational profile.
  • the methods of the invention include optimization of the model with patient data and the use of the model to analyze the best treatment strategy.
  • the methods of the invention determine how one can assess toxicity effects from using a single or combination agent therapy on the normal cell. Iterative Refinement of the Model through Experiments
  • the colon cancer model as optimized above predicted that inhibiting NFkappaB and stimulating with TNF, in combination, would synergistically promote increased levels of apoptosis. This prediction was based on the fact that TNF concurrently activates the apoptosis
  • the latter pathway leads to upregulation of survival signals which thwart or inhibit the apoptotic signaling.
  • Fig. 42 shows the cleavage of PARP as a result of inhibiting Ikappab-alpha in combination with adding TNF at various levels.
  • Fig. 42 shows the relative levels of cleaved PARP after 24 hours of stimulation with various doses of TNF.
  • the cells were stimulated with TNF alone, with 20 and 50 uM of the Ikappab-alpha inhibitor, and with DMSO, the agent that solublizes the inhibitor as a control.
  • the inhibition of Ikkappab-alpha effectively blocks NFkappaB nuclear translocation preventing NFkappaB from activating the transcription of anti-apoptotic genes.
  • NFKappaB inhibition Only those cells in which NFkappaB signaling is strong as a result of TNF activation will respond strongly to the combination. In this way, the model was refined and in its refined state could be used to determine which cell types or cancers could be treated by inhibiting NFkappaB transcription in combination with adding TNF so as to synergize the promotion of apoptosis.
  • K is an inferred parameter
  • E_i (Delta E_i) are the experimental measurements (uncertainties)
  • Delta K is the uncertainty in K.
  • Case (2) arises in the event that there are different sets of dynamical system parameters that fit the data equally well, given the uncertainties in the experimental data.
  • the dynamical system can be used to predict which experimental measurement would be most effective in determining which set of dynamical system parameters is a more accurate description of biology. This proceeds as follows:
  • This iterative procedure eliminates systematically the different local minima that arise in fitting experimental data to large dynamical systems.
  • the onset of caspase activation shown in Figure 32(g) can be due to a heavy weighting of parameter values responsible for the positive feedback loop between Caspase 8, Caspase 9, and Caspase 3 shown in Figure 32(g) or can be due to a heavy weighting of parameter values responsible for autocrine
  • the above methodology would output two fits to the data one with chemical concentrations and parameter sensitviies that favor the feedback mechanism between the caspases and the other that favored high levels of TNF and FASL resulting from autocrine signaling.
  • the only way to distinguish between the two hypothesis predicted by the model is to carry out an experiment that perturbed the chemicals and or parameters deemed important by the above analysis. In this way the methodology has been used to generate more than one possible hypothesis.
  • the Diagrammatic Cell Language is a language for modeling and simulating biological systems. At its core is the notion that the behavior of biological systems may be best understood as an abstract computation. In this view point, the units of biological heredity are packets of information, and the cell's biochemical circuitry as a layer of computation evolved with the goal of replicating the data stored in the hereditary material.
  • the Diagrammatic Cell Language is a precise representation designed to be translated into a computer model of the reactions it represents. This translation is referred to as parsing.
  • the parsing can occur via human modlers or a computer algorithm. The is the beauty of the
  • K 3:2219704.1 language biologists can use it to map out describe the biological interactions, modlers can then parse the diagrammatic model built by the biologist and create a dynamical simulation using said simulation environments (mention all other simulation environments).
  • the translation of the Diagrammatic Cell Language into computer code using a computer algorithms has been
  • This language is the best method for concisely representing the massive amounts of cellular interactions that can occur in the cell. This has been used to map the biological interactions of a 200 component network of colon a cell modularly represented in Figures 31 and 32.
  • the mathematical equations representing that diagram have been parsed by a human modeler and are represented in Exhibit A. What makes the language so efficient, are higher order structures that can compactly represent all of the interactions a cellular component can under go (e.g. a protein binding to multiple components and getting modified by other entities).
  • the constructs, unique to the language: linkboxes and likeboxes are shown in Figure 43(a), allow the biologist to compactly create diagrams of the cell without simplifying the functional representation of a compound or reaction.
  • the language is modular: compounds and reactions may be represented at levels of complexity at the discretion of the biologist. It is object-oriented: a bound form of compounds, for example, inherits the states of its constituents.
  • Atom - An atom is a noun. It is one particular state of a molecule (or a molecule that is modeled as having one state).
  • Fig. 43(a) depicts an example atom, A. (See Fig. 43(a)-43(b) for examples of the graphic symbols in DCL.
  • reaction is a verb. It is a symbol that represents a transformation of a set of nouns to a set of nouns. Its symbol is a squiggle.
  • Dimerization - A dimerization is a shorthand symbol that represents two compounds reacting with each other to form a new bound form. Its symbol is a black dot:
  • Compartments - A compartment is a location with the cell. For instance, as shown in Fig. 43(a), the nucleus (A is in the nucleus):
  • Linkbox - A linkbox is a noun that represents a collection of states. It could be used to automatically represent, for example, the many states of a protein that has several
  • the example linkbox is shown with with two states:
  • Likebox - A likebox is a noun that represents a group of nouns or reactions that all have a similar function.
  • Resolution Notation is used to identify particular states or subsets of states of a linkbox or likebox.
  • the basic form is a text string like this, which indicates that state (1) is bound, and states (2-4) are unbound:
  • Equivalence line - This is a line that connects two nouns that are equivalent. This indicates protein A is equivalent to protein B.
  • Complex - A complex represents a collection of nouns that bind together in a prescribed way. This is a complex of proteins A, B and C (A binds to B, which binds to C, which in turn binds back to the original A). See top of Fig. 43(b).
  • Modifier - Modifiers are used to express molecules that are both reactants and products of a reaction, or inhibitions. They could be used to represent catalysts, for example (C enables the reaction of A to B).
  • a process is a verb. It is a module which describes either unknown or coarsely modeled behavior. The process is a double-squiggle. Here is a process that could represent, coarsely, the production of B from A.
  • Unique The word unique describes a noun. It represents a molecule that only appears once. An example is a Promoter. A unique noun has a double-lined border.
  • Ubique The word ubique describes a noun. It represents a molecule that is so common it can be considered to be of constant concentration and continuous availability. An example is Calcium. A ubique noun has either a dashed border or no border.
  • the first approach (a) is a more traditional approach to drawing a network diagram. It consists solely of low-level language constructs. Notice how the (a) diagram, alone is imprecise: it requires additional text to explain the function of the molecules and thus can not efficiently be converted into a set of mathematical equations for simulation.
  • Fig. 44(a) depicts a portion of the Ras Activation pathway expressed in a traditional way. The traditional depiction, however, is unclear. The following text is required for clarification:
  • Farnesyl protein transferase catalyzes the addition of afar nesyl group onto C-N-Ras, C-K(A)-Ras, and C-K(B)-Ras.
  • Faresyl transferase inhibitor FPTI
  • Some classes ofFTIs work by irreversibly binding to FPTase.
  • Geranylgeranyl transferase l(GGTasel) catalyzes the addition of a geranylgeranyl moiety onto the same group of Ras molecules. After lipid modification, each Ras molecules translocates from the cytosol to the membrane.
  • Fig. 44(b) depicts the same pathway expressed with the Diagrammatic Cell Language. Provided that one learns how to read the language, all of the above information in the traditional depiction is contained solely within the DCL visual diagram. Note how no additional text is necessary to explain the function of the molecules, the diagram is much more concise, it is precise, and it is much easier to read.
  • the above diagram can then be parsed into a distinct set of chemical states and a distinct set of reactions. For each reaction the modler or computer algorithm will have to attach a kinetic form and kinetic values to the parameters.
  • the above pictures parses into the following set of states or chemical entities (the underscore here indicates the modified or bound form of the chemical) and reaction steps which the modeler can then attatch a specific kinetic form to:
  • Such code can take many forms, and be implemented in a variety of languages utilized for such purposes. The following describes various exemplary embodiments of such code which were written to implement the various functionalities of the present invention.
  • the code was written in C++, with network classes representing the reaction networks, a director class to handle multiple copies of the network, a minimizer class to determine optimizable parameters and integrator classes to simulate the behavior of the network over time. Chemical, reaction and rate constant classes were defined to be used in the network class.
  • the user had to define a network class specifying three lists of objects. The first was a list of chemical objects corresponding to the chemicals participating in the network. The second was a list of rate constant objects that were used to define the rate of the reactions. The third list consisted of reaction objects, each of which used subsets of the chemical and rate constant list defined above. The initial values of the chemicals and rate constants were hard coded into this class.
  • KU 2219704 1 To simulate multiple experiments on the same network, multiple copies of the network would be defined, with different initial conditions, and a director class would handle the list of these networks. In the case where only one network was optimized, the list in the director contained that single network. The integrator to use was also specified directly in the director class. The time series of chemicals that were to be plotted were also specified in this class.
  • the user first had to create the network classes with the appropriate chemicals, rate constants and reactions. Then he/she would create the director class using these network classes.
  • the integrator to use for each network was specified here, and the various variables controlling the behavior of the integrator were also set in this class.
  • the files containing the experimental time series data against which optimization was done, was set in this class. Then in the main body of the code, the director class, and a minimizer class were instantiated, and the director was passed to the minimizer for optimization of the rate constants.
  • Each differential equation corresponds to the rate of change of the concentration of a chemical in the network.
  • the time series are generated by integrating this set of diff. equations, starting with the initial concentrations and the rate constants.
  • the "global objective function" value is computed by the director by summing over the objective function value from each network.
  • the optimizable parameters, i.e., the rate constant values, and the calculated global objective function value are then passed to the minimizer.
  • the minimizer then returns a new set of values for the rate constants.
  • this version included the well known Runge-Kutta integrator and variations thereof, that allowed for different
  • the other type of integration algorithm was an implementation of the stochastic integrator by Gillespie that determined the cumulative probability of occurrence of each reaction in the system and chose one based on a uniform random number that is compared to the cumulative probability.
  • the optimization algorithms used were the deterministic Levenberg- Marquardt method and the stochastic Simulated Annealing method.
  • the user defines a biochemical network by creating a text file for the describing the network.
  • the network class is defined generally, and is filled in during runtime, based on the description file.
  • the chemicals and rate constants can be made optimizable selectively through this file.
  • the director for multiple networks is specified in a separate file.
  • the networks, over which parameter identification is to be done, are included in the director by including their file names in the director file.
  • the integrators and identifiers are also defined through this file. Now a sequence of identifiers can be used by specifying multiple identifiers in the input file.
  • chemical concentrations and rate constants are treated equally, as variables defining the state of the network. If the user chooses, they can now be defined as mathematical functions of other variables (both chemical concentrations and rate constants) and numeric constants. They can also be sent into the code with a predefined time series. To take care of ail these changes the mechanism for handling time series was completely changed. A hierarchy of various time series classes to handle the various types of definitions for chemicals and rate constants was created. The form of the time series is specified in the input file.
  • Network variables can be expressed as mathematical functions of other Network variables.
  • Rate constants can be unknowns, i.e., they can be treated as state variables for which there is a differential equation to integrate, just like the chemical state variables.
  • Network variables in the network can include time delays, so they can depend on states of network variables at different times.
  • Network variables can be expressed as switches, allowing reactions to be turned on/off at specified times, or depending on the state of some other network variable.
  • Time series expressing these network variables can be expressed in terms of o a cubic spline o a polynomial interpolation o a mathematical formula - sum, product , difference, quotient, power, elementary function (sin, cos, tan, arcsin, arccos, arctan, log, exp), switch or gaussian.
  • the code enables parallelization of the code using the MPI (Message Passing Interface) library.
  • MPI Message Passing Interface
  • the simulated annealing parameter identifier was completely rewritten, allowing for the use of a parallel architecture.
  • the preferred code now supports the integration of "stiff systems of differential equations, resulting from reactions occurring on very different time scales.
  • a stochastic integrator is included based on the "Next-Reaction” method by Gibson, which is more efficient than the previously used Gillespie algorithm.
  • the preferred code can compute the sensitivity of the cost function to changes in the parameters.
  • the cost function sensitivities are computed by solving the sensitivity equations, which give the sensitivity of each chemical concentration with respect to the parameters.
  • the chemical sensitivities are used to compute a raw value for the cost function sensitivity. Since these raw values depend on the scale of the associated parameters, they are normalized so that the sum of the squares of the normalized values is equal to one. With the normalized values, one can determine immediately which parameters have the greatest effect on the cost function; i.e., on the goodness of fit of the model.
  • the parameter identification has as also been improved.
  • the new simulated annealing algorithm has been parallelized.
  • the preferred code introduces a separate temperature schedule to gradually limit the range in parameter space, over which the next step can move.
  • the deterministic Levenberg-Marquardt can use the parameter sensitivities to compute the gradient. This is both more efficient and more accurate than a finite difference approximation.
  • Both parameter identifiers now allow for imposing a lower and upper bound on each optimizable parameter, thus narrowing the parameter search space.
  • KL3:2219704.1 An analysis tool to aid the modeler in determining which parameters have the greatest effect on the concentration of specified chemicals.
  • Object-Oriented Database An object-oriented database (OODB) to house data for various biochemical reaction networks. Using a separate code the user will be able to use the database to setup a simulation or parameter identification run. The input portion of the code reads the database directly from the database and then save the outputs back to it. The OO nature of the database simplifies the task of communicating with the database since the same classes for the code can be used, as those that are used to define the database.
  • KU 2219704 1 • Improved parallel algorithms for optimization: In this embodiment, the parallel simulated annealing code is improved by implementing various parallel architectures in the code. The Levenberg-Marquardt identifier is also be improved by a straightforward parallelization of the algorithm. • Multi-threaded integration: The integration algorithms are made multi-threaded. This leads to a better performance than using only parallel code.
  • Hybrid method Standard intergrations methods are either purely continuous or purely stochastic. For many of the large systems contemplated to be analyzed by the methods of the present invention, a purely stochastic algorithm is too slow. On the other hand a purely continuous algorithm inaccurately describes reactions involving chemicals with few molecules in the system. Very often a biochemical network contains some molecules which play a central role, yet they have a very occurrences of them in the system. Hence to properly yet efficiently simulate a system a hybrid stochastic-continuous integrator is optimal. • Among other parameter identification algorithms, genetic algorithms and the direct fit method are implemented.
  • the following example shows how users can input models into the software and specify the integration and optimization algorithms to use.
  • the first file the code reads is the director file. This file specifies the network(s), the integrator and the method(s) to be used for parameter identification.
  • the director file contains a director block and a parameter_identification block.
  • the director block contains one or more network blocks and an integrator block.
  • the director has one network which is read in from the file "p53_network . txt".
  • the integrator block specifies the integrator to use (in this case CVode, which is a third-party package for integrating stiff systems of ODEs) and parameters which control the behavior of the integrator.
  • the parameter_identification block is optional; if it is omitted, the code will run a simulation and exit. If included, it specifies one or more identifiers to run along with options controlling their behavior. If multiple identifiers are specified, the first will run with the initial values of the parameters specified by the user, the second will start with the optimized values found by the first identifier, etc.
  • the network is usually read in from a separate file as in this example.
  • different parts of the network are typically contained in separate files as well.
  • the file "p53_networ . txt" has the following lines:
  • DNAPK (1000) ;
  • KL322197041 indicate that these four parameters are optimizable; when the optimizer runs, it will change the values of these four parameters in an attempt to fit the experimental data.
  • MMR damagedDNA, DNAPK, DNAPKa
  • TR DNAPKa, DNAPK
  • KL3 2219704 1 TR(p5315PhosPhos20Phos37Phos,p5315PhosPhos20Phos37PhosCyto
  • MMR (DNAPKa, p5320Phos,p5315Phos20Phos I kp_p53 15Phos,km_p53 15Ph os) ; MR(DNAPKa,p5320Phos37Phos,p5315Phos20Phos37Phos
  • HDDR (p5320Phos37Phos_CBP,p5320Phos37Phos, CBP
  • HDDR (p5315Phos20Phos37Phos_CBP, p5315Phos20Phos37Phos , CBP
  • HDR (p5315PhosPhos20Phos37Phos , CBP, p5315PhosPhos20Phos37Phos__CBP
  • HDDR (p5315PhosPhos20Phos37Phos_CBP, p5315PhosPhos20Phos37Phos , CBP
  • HDR (ARF, E2F,ARF_E2F
  • HDDR (ARF Mdm2 , RF,Mdm2 I ku ARF Mdm2) ;
  • HDDR (Mdm2_ 53Cyto, Mdm2Cyto, p53Cyto
  • HDDR (Mdm2_p5337PhosCyto, Mdm2Cyto,p5337PhosCyto
  • GR Mdm2mRNACyto, Mdm2Cyto j ks_Mdm2
  • UDR Mdm2mRNA
  • GR (YYYPromoter, YYYmRNA
  • Each reaction is specified by an abbreviation (e.g., HDR means HeteroDimerization Reaction, MMR means Michaelis-Menton Reaction, etc.) followed by a list of chemicals and rate constants (parameters) that participate in the reaction.
  • the chemicals are listed first followed by a vertical bar (“I") and the list of rate constants.
  • the chemicals and rate constants must be listed in the correct order. For example the line
  • KL3:2219704.1 there is experimental data for only one chemical, p53.
  • the values are specified in triples in which the first number is the time of the observation, the second is the concentration, and the third is the error.
  • t_p53_Mdm2 SUM (Mdm2_p5337Phos,Mdm2_p5337Phos) ;
  • simulation_TS (t_p5315Phos, t_p5315PhosPhos, t_p5320Phos, t p5337Phos, a_p53 , t_ 53_NoMdm2)
  • simulation_TS (Mdm2 , ARF_Mdm2 , Mdm2_jp53 , Mdm2_p5337Phos) ;
  • This file defines some new chemicals which are used only for output. They are total levels of certain chemicals, so they are defined as a mathematical function (SUM) of these other chemicals. Any chemical can be saved to a file or plotted.
  • SUM mathematical function
  • Network 1 has 78 states and 76 parameters
  • New lowest cost 217203 ks_p53 - (1473.71, , ) kd_jp53 - (3.01649, , ) kt_p53 Nucl ⁇ (1.10296, , ) kt_j)53 Cyto ⁇ (0.148332, , )
  • the code Each time the code finds a new lowest cost, it writes the chemicals and parameters to disk files in a format which can be read back in to restart the run. Depending on the verbosity setting, it also writes the values of the optimizable parameters to the screen. All optimizable parameters have three values: the current value, the lower bound and the upper bound. In this example, the bounds are not specified, so they have their default values which are determined by taking the original value and dividing by 10 for the lower bound and multiplying by 10 for the upper bound. When the code has finished running the optimizer, it writes out a summary:
  • New lowest cost 134.676 ks p53 ⁇ (148.928, , ) kdjp53 - (1.07345, , ) ktj?53 Nucl ⁇ (1.72851, , ktj?53 Cyto - (1.6086, ,
  • the code then takes the values found by the simulated annealing minimizer as starting values for the Levenberg-Marquardt minimizer. After this minimizer has finished, the code again writes a summary:
  • New lowest cost 0.255947 ks__p53 ⁇ (148.92, , ) kd__p53 - (0.285072, , ) kt_p53 Nucl ⁇ (0.537043, , ) kt_p53 Cyto ⁇ (2.45473, , ) Reached maximum number of iterations, solution may not be local minimum
  • results of the optimization are written back to a set of output files that mimic the input files, except for the fact that the original values of the optimizable parameters are replaced with the new optimized values.
  • the files can then used as input to a new optimization or simulation run.
  • Described herein is a system for inferring one or a population of biochemical interaction networks, including topology and chemical reaction rates and parameters, from dynamical or statical experimental data, with or without spatial localization information, and a database of possible interactions. Accordingly, the invention, as described herein, provides systems and methods that will infer the biochemical interaction networks that exist in a cell. To this end, the
  • KU 221 704 1 systems and methods described herein generate a plurality of possible candidate networks and then apply to these networks a forward simulation process to infer a network.
  • Inferred networks may be analyzed via data fitting and other fitting criteria, to determine the likelihood that the network is correct. In this way, new and more complete models of cellular dynamics may be created.
  • Fig. 46 depicts a model generator 12 that creates new model networks, drawing from a combination of sources including a population of existing networks 14 and a probable links database 16.
  • a parameter-fitting module 18 evaluates the model network, determining parameter values for the model network based on experimental data 20.
  • a simulation process 26 may aid the optimization of the parameters in the parameter-fitting module 18.
  • An experimental noise module 22 may also be used in conjunction the parameter- fitting module 18 to evaluate the model's sensitivity to fluctuations in the experimental data 20.
  • a cost evaluation module 24 may test the reliability of the model and parameters by examining global and local fitness criteria.
  • a population of existing networks 14 stores previously inferred network models in a computer database and may provide network models to a model generator 12 for the generation of new network models. Completed network models are added to the population of existing models 14 for storage, transferred from a cost-evaluation module 24.
  • a probable links database 16 stores data representative of biochemical interactions obtained from bioinformatics predictions, and may also include hypothetical interactions for which there is some support in the published literature.
  • the probable links database 16 couples with the model generator 12 to provide links for the formation of new network models where necessary.
  • the model generator 12 uses any of a number of model-fitting techniques that are known to those of skill in the art to generate new biochemical network models.
  • the model generator 12 employs genetic algorithms to generate new networks, using two networks present in the population. Such genetic algorithms may use other information to guide the recombination of networks used in constructing new networks, such as sensitivity analysis of the parameters of one or both of the parent networks. They may also use the results of clustering analyses to group together networks in the population that behave in similar ways dynamically, and selectively recombine networks belonging to the same dynamical cluster or, for heterotic
  • KL3:2219704 1 vigor recombine networks belonging to different dynamical clusters but which fit the data approximately equally well.
  • the model generator 12 may draw one or more networks from the population of existing networks 14 and incorporate any number of possible interactions from the probable links database 16.
  • the model generator 12 may rely solely on the probable links database 16, generating a new network model without relying on the population of existing networks 14.
  • the model generator 12 uses multiple evaluation criteria, e.g. finite state machines, to test generated networks for compatibility with experimental data, as in Conradi et al. (C. Conradi, J. Stelling, J. Raisch, IEEE International Symposium on Intelligent Control (2001) ' Structure discrimination of continuous models for (bio)chemical reaction networks via finite state machines', p. 138).
  • the model generator may also use Markov Chain Monte Carlo methods (W. Gilks, S. Richardson and D. Spiegelhalter, "Markov Chain Monte Carlo in Practice', Chapman and Hall, 1996), or variational methods(M. Jordan, Z. Ghahramani, T. Jaakkola, L.
  • the model generator may also use the results of clustering large-scale or high-throughput experimental measurements, such as mRNA expression level measurements, perhaps combined with bioinformatics predictions such as for genes with common binding sites for transcription factors, or secondary structure predictions for proteins that may be possible transcription factors, to generate models consistent with these clustering and bioinformatics results, in combination or singly.
  • the model generator may also include reactions suggested by a control theory based module, which can evaluate portions of a given network in the population and modify them according to calculations based on robust control theory (F.L. Lewis, Applied Optimal Control and Estimation, (Prentice-Hall, 1992)).
  • KL3:2219704.1 this population in a manner that is similar or equivalent to a Monte Carlo evaluation, of the likelihood that the model is correct, in the Bayesian sense over the ensemble of all networks, weighted by the a priori measure of the space of networks.
  • the newly generated network model passes from the model generator 12 to a parameter fitting module 18 for optimization of the network parameters.
  • a parameter fitting module 18 optimizes the model parameters received from the model generator 12 using experimental data 20 as a calibration point, either in a single step or by coupling with a simulation module 26 for iterative parameter fitting. Optimization methods may be according to any global or local routine or combination of routines known to one of skill in the art. Examples include, but are not limited to local optimization routines such as Levenberg- Marquardt, modified Levenberg-Marquardt, BFGS-updated secant methods, sequential quadratic programming, and the Nelder-Mead method, or global optimization routines such as simulated annealing or adaptive simulated annealing, basic Metropolis, genetic algorithms, and direct fitting. Following parameter optimization, the parameter fitting module 18 passes the network model to a cost evaluation module 24.
  • local optimization routines such as Levenberg- Marquardt, modified Levenberg-Marquardt, BFGS-updated secant methods, sequential quadratic programming, and the Nelder-Mead method
  • global optimization routines such as simulated
  • the experimental data 20 consists of qualitative or quantitative experimental data, such as mRNA or protein levels, stored in a computer database.
  • the experimental data 20 may be obtained through any of a variety of high-throughput data collection techniques known to one of skill in the art, including but not limited to immunoprecipitation assays, Western blots or other assays known to those of skill in the art, and gene expression from RT-PCR or oligonucleotide and cDNA microarrays.
  • the experimental data 20 couples directly with the parameter fitting module 18 for parameter optimization, and possibly with an experimental noise module 22. In other practices the systems and methods described herein employ other types of data, including, for example, spatial localization data.
  • the model has (x,y,z,t) spatial and temporal coordinates for components as well.
  • Confocal microscopy is one of the technologies for getting both dynamical and spatial localization.
  • One example of why this is important, is that the total levels of protein A may not change at all as a result of the perturbation. But its levels in the cytosol versus nucleus may be changing as a result of the perturbation whereby A is getting translocated from cytosol to nucleus to participate in other processes.
  • Our inference may use both dynamical and static data, as well as information on spatial localization.
  • An experimental noise module 22 may be used to provide an indication of the model's sensitivity to small
  • the noise module 22 acts as an interim step between the experimental data 20 and the parameter fitting module 18, introducing variations into the experimental data 20 for evaluation following parameter optimization in a cost-evaluation module 24.
  • the noise generation could be implemented by modeling the uncertainty in any given experimental observation by an appropriate distribution (e.g. log-normal for expression data) and picking noise values as dictated by the distribution for that experimental observation.
  • an optional cost evaluation module 24 may evaluate the network model received from the parameter fitting module 18 according to cost or fitness criteria.
  • the cost evaluation module 24 ranks a model's reliability according to the chosen fitness or cost criteria.
  • the criteria employed by the cost evaluation module 24 may include, but are not limited to: (1) insensitivity of the model to changes in the initial conditions or chemical reaction parameters, (2) robustness of the model to the random removal or addition of biochemical interactions in the network, (3) insensitivity to variations in the experimental data (with variations introduced into the experimental data in the experimental noise data 22), and (4) overall bioinformatics costs associated with the model.
  • bioinformatics costs are the number of gene prediction algorithms that simultaneously agree on a particular gene, the number of secondary structure prediction algorithms that agree on the structure of a protein, and so on. Coupled to this, some bioinformatics algorithms allow comparison to synthetically generated sequence (or other) data, thereby allowing the calculation of likelihoods or confidence measures in the validity of a given prediction.
  • the cost evaluation module 24 then adds the new network model and the results of its cost criteria to the population of existing networks 14.
  • Models in the population of existing networks continue to be evaluated and tested by adding and removing links in iterative operations of the system herein described. There is no specific starting point in the system. Users of the system may generate networks entirely from the probable links database 16, or from a combination of the probable links database 16 with the population of existing networks 14. Iterative refinement may continue until a single network attains a goodness of fit to experimental data, perhaps combined with low costs for dynamical robustness or other criteria, below a user defined threshold, or a stable dynamically similar cluster of networks emerges from the population of networks. This stable cluster may then be used to compute robust predictions by averaging over the predictions of elements of the cluster of networks, in a cost- weighted average, where the costs include, but are not limited to, goodness
  • KU 2219704 1 of fit to the experimental data dynamical robustness, probabilistic or exact evaluation of insensitivity to experimental noise and/or parameter values.
  • networks with lower costs contribute more to predictions than networks with higher costs.
  • the refinement of the pool of networks may be continued until the (average or best) goodness of fit of the networks in the stable cluster is below some user defined threshold, or until the number of networks in the cluster is above some user defined threshold.
  • the single network may be solely used for generating predictions.
  • the depicted process shown in Figure 1 can be executed on a conventional data processing platform such as an IBM PC-compatible computer running the Windows operating systems, or a SUN workstation running a Unix operating system.
  • the data processing system can comprise a dedicated processing system that includes an embedded programmable data processing system.
  • the data processing system can comprise a single board computer system that has been integrated into a system for performing micro-array analysis.
  • the process depicted in Fig. 46 can be realized as a software component operating on a conventional data processing system such as a Unix workstation.
  • the process can be implemented as a C language computer program, or a computer program written in any high level language including C++, Fortran, Java or basic.
  • the process may also be executed on commonly available clusters of processors, such as Western Scientific Linux clusters, which are able to allow parallel execution of all or some of the steps in the depicted process.
  • the systems and methods described herein include systems that create a pool of candidate or possible networks that have been generated to match data, including data that is biologically realistic as it arises from relevant literature or experiments.
  • the systems described herein may, in certain embodiments, apply a discriminator process to the generated pool of possible networks.
  • the system may employ pools identified by the discriminator process as data that may be applied to a network generation module.
  • the network generation module can process these possible networks with data from the probable links database to generate output data that can be processed by the fitting module as described
  • Fig. 49-50 contain the predicted time course data from chemicals for which we had observed data for and unobserved data for (the curves labeled (1) are for the experimental time course and those labeled (2) are for the reconstructed time course from the network inference methodology described above). Not only are we able to resonstrict the dynamical behavior for the observed chemicals, but we are also able to predict the trends in the unobserved chemicals as well.
  • the methods in the invention can be used to perform dread discovery.
  • the methods of the invention can be used to determine which
  • K 3:221 704.1 populations of patients have specific targets and are therefore amenable to treatment with specific therapeutics, based upon the biological data representing those populations.
  • the biological data of specific persons can be used in the simulations of the invention to find the best therapeutic strategy for treating that person, i.e. the dose, time, order, etc. of different therapeutics affecting specific targets in that person.
  • the methods of the invention can contribute to finding useful therapies for "failed compounds", i.e. compounds which have not performed well in clinical trials, by altering offering the combinations of targets for such compounds, combining other therapeutic compounds with the failed compounds, determining specific populations to treat, etc.
  • simulations can be used to identify a molecular marker as a predictor of a disease condition such that diagnosing physicians may perform analytical tests for such markers as a precondition to diagnosing that condition.
  • compounds which are found to have therapeutic value can be altered in structure or function to make them more effective, e.g. by reducing the number of targets which are addressed, creating higher binding affinities in the therapeutic, etc.
  • the biological data can be used to infer what target the drug is impacting based on network inferences.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • Microbiology (AREA)
  • Cell Biology (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

L'invention concerne des systèmes et des procédés permettant de modeler les interactions entre les différents gènes, protéines et autres composants d'une cellule, faisant appel à des techniques mathématiques permettant de représenter les relations réciproques entre les composants cellulaires ainsi qu'à la manipulation de la dynamique d'une cellule afin de déterminer quels composants d'une cellule peuvent constituer des cibles pour interagir avec des agents thérapeutiques. Un premier procédé selon l'invention est fondé sur une approche de simulation cellulaire dans laquelle un réseau biochimique cellulaire intrinsèque à un phénotype de la cellule est simulé par spécification de ses composants et de leurs relations réciproques. Les différentes relations réciproques sont représentées à l'aide d'une ou de plusieurs équations mathématiques, qui sont résolues en vue de simuler un premier état de la cellule. Le réseau simulé est ensuite perturbé par suppression d'un ou de plusieurs composants, modification de la concentration en un ou plusieurs composants ou modification d'une ou de plusieurs équations mathématiques représentant les relations réciproques entre un ou plusieurs des composants. Les équations représentant le réseau perturbé sont résolues en vue de simuler un second état de la cellule, lequel est comparé au premier état de la cellule pour identifier l'effet de la perturbation sur l'état du réseau, ce qui permet d'identifier un ou plusieurs composants comme étant des cibles. Un second procédé permettant d'identifier certains composants d'une cellule comme étant des cibles pour interagir avec des agents thérapeutiques est fondé sur une approche analytique dans laquelle un phénotype stable d'une cellule est spécifié et mis en corrélation avec l'état de la cellule, et le rôle de cet état cellulaire est corrélé avec son fonctionnement. Un réseau biochimique cellulaire considéré comme étant intrinsèque à ce phénotype est ensuite spécifié par identification de ses composants et de leurs relations réciproques et représentation de ces relations réciproques dans une ou plusieurs équations mathématiques. Le réseau est ensuite perturbé et les équations représentant le réseau perturbé sont résolues pour déterminer si cette perturbation est susceptible d'entraîner la transition de la cellule d'un phénotype à un autre, ce qui permet d'identifier un ou plusieurs composants comme étant des cibles.
EP02773968A 2001-11-02 2002-11-04 Procedes et systemes permettant d'identifier des composants de reseaux biochimiques mammaliens comme etant des cibles d'agents therapeutiques Withdrawn EP1454282A4 (fr)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US33599901P 2001-11-02 2001-11-02
US335999P 2001-11-02
US40676402P 2002-08-29 2002-08-29
US406764P 2002-08-29
US10/286,372 US20030144823A1 (en) 2001-11-01 2002-11-01 Scale-free network inference methods
US286372 2002-11-01
PCT/US2002/035301 WO2003040992A1 (fr) 2001-11-02 2002-11-04 Procedes et systemes permettant d'identifier des composants de reseaux biochimiques mammaliens comme etant des cibles d'agents therapeutiques

Publications (2)

Publication Number Publication Date
EP1454282A1 true EP1454282A1 (fr) 2004-09-08
EP1454282A4 EP1454282A4 (fr) 2005-04-06

Family

ID=27403609

Family Applications (1)

Application Number Title Priority Date Filing Date
EP02773968A Withdrawn EP1454282A4 (fr) 2001-11-02 2002-11-04 Procedes et systemes permettant d'identifier des composants de reseaux biochimiques mammaliens comme etant des cibles d'agents therapeutiques

Country Status (3)

Country Link
EP (1) EP1454282A4 (fr)
CA (1) CA2501111A1 (fr)
WO (1) WO2003040992A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7069534B2 (en) 2003-12-17 2006-06-27 Sahouria Emile Y Mask creation with hierarchy management using cover cells
US7844431B2 (en) 2004-02-20 2010-11-30 The Mathworks, Inc. Method and apparatus for integrated modeling, simulation and analysis of chemical and biochemical reactions
US8554486B2 (en) 2004-02-20 2013-10-08 The Mathworks, Inc. Method, computer program product, and apparatus for selective memory restoration of a simulation
WO2007002895A1 (fr) * 2005-06-29 2007-01-04 Board Of Trustees Of Michigan State University Structure d'integration utilisee dans la recherche d'une voie d'integration en trois etapes

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002065119A1 (fr) * 2001-02-09 2002-08-22 The Trustees Of Columbia University In The City Of New York Procede de prediction d'interactions moleculaires dans des reseaux

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930154A (en) * 1995-01-17 1999-07-27 Intertech Ventures, Ltd. Computer-based system and methods for information storage, modeling and simulation of complex systems organized in discrete compartments in time and space
US5914891A (en) * 1995-01-20 1999-06-22 Board Of Trustees, The Leland Stanford Junior University System and method for simulating operation of biochemical systems
US6165709A (en) * 1997-02-28 2000-12-26 Fred Hutchinson Cancer Research Center Methods for drug target screening
US5965352A (en) * 1998-05-08 1999-10-12 Rosetta Inpharmatics, Inc. Methods for identifying pathways of drug action
US6132969A (en) * 1998-06-19 2000-10-17 Rosetta Inpharmatics, Inc. Methods for testing biological network models
WO2000023933A1 (fr) * 1998-10-21 2000-04-27 Bios Group Lp Systemes et procedes d'analyse de reseaux genetiques
US6203987B1 (en) * 1998-10-27 2001-03-20 Rosetta Inpharmatics, Inc. Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
US20020068269A1 (en) * 2000-03-10 2002-06-06 Allen Eric B. System and method for simulating cellular biochemical pathways
AU2001256585A1 (en) * 2000-04-14 2001-10-30 Hybrigenics S.A. Method for constructing, representing or displaying protein interaction maps

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002065119A1 (fr) * 2001-02-09 2002-08-22 The Trustees Of Columbia University In The City Of New York Procede de prediction d'interactions moleculaires dans des reseaux

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
HASTY JEFF ET AL: "Computational studies of gene regulatory networks: In numero molecular biology" NATURE REVIEWS GENETICS, vol. 2, no. 4, April 2001 (2001-04), pages 268-279, XP002316687 ISSN: 1471-0056 *
HUANG SUI: "Gene expression profiling, genetic networks, and cellular states: An integrating concept for tumorigenesis and drug discovery" JOURNAL OF MOLECULAR MEDICINE, SPRINGER VERLAG, DE, vol. 77, no. 6, June 1999 (1999-06), pages 469-480, XP002205218 ISSN: 0946-2716 *
IDEKER T ET AL: "Integrated genomic and proteomic analyses of a systematically perturbed metabolic network" SCIENCE, AMERICAN ASSOCIATION FOR THE ADVANCEMENT OF SCIENCE,, US, vol. 292, 4 May 2001 (2001-05-04), pages 929-934, XP002963505 ISSN: 0036-8075 *
SCHERF U ET AL: "A gene expression database for the molecular pharmacology of cancer" NATURE GENETICS, NATURE AMERICA, NEW YORK, US, vol. 24, no. 3, March 2000 (2000-03), pages 236-244, XP002224798 ISSN: 1061-4036 *
SCHUSTER STEFAN ET AL: "A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks" NATURE BIOTECHNOLOGY, vol. 18, no. 3, March 2000 (2000-03), pages 326-332, XP002316686 ISSN: 1087-0156 *
See also references of WO03040992A1 *
SHMULEVICH ILYA ET AL: "Gene perturbation and intervention in Probabilistic Boolean Networks." BIOINFORMATICS (OXFORD), vol. 18, no. 10, October 2002 (2002-10), pages 1319-1331, XP002316689 ISSN: 1367-4803 *
SOMOGYI ROLAND ET AL: "The dynamics of molecular networks: Applications to therapeutic discovery." DRUG DISCOVERY TODAY, vol. 6, no. 24, 2001, pages 1267-1277, XP002316688 ISSN: 1359-6446 *
WAGNER A: "How to reconstruct a large genetic network from n gene perturbations in fewer than n2 easy steps" BIOINFORMATICS, OXFORD UNIVERSITY PRESS, OXFORD,, GB, vol. 17, no. 12, 2001, pages 1183-1197, XP002973480 ISSN: 1367-4803 *

Also Published As

Publication number Publication date
CA2501111A1 (fr) 2003-05-15
WO2003040992A1 (fr) 2003-05-15
EP1454282A4 (fr) 2005-04-06

Similar Documents

Publication Publication Date Title
US7415359B2 (en) Methods and systems for the identification of components of mammalian biochemical networks as targets for therapeutic agents
Ruiz et al. Identification of disease treatment mechanisms through the multiscale interactome
Tang et al. Network pharmacology strategies toward multi-target anticancer therapies: from computational models to experimental design principles
Devi et al. Evolutionary algorithms for de novo drug design–A survey
Barbuti et al. A survey of gene regulatory networks modelling methods: from differential equations, to Boolean and qualitative bioinspired models
Zhu et al. Getting connected: analysis and principles of biological networks
Das et al. Binding affinity prediction with property-encoded shape distribution signatures
Tolios et al. Computational approaches in cancer multidrug resistance research: Identification of potential biomarkers, drug targets and drug-target interactions
Singh et al. Contrastive learning in protein language space predicts interactions between drugs and protein targets
Rukhlenko et al. Control of cell state transitions
EP1629279A1 (fr) Procedes et systemes permettant de creer et utiliser des simulations globales et guidees par les donnees de systemes biologiques pour des applications pharmacologiques et industrielles
Fernández-Torras et al. Connecting chemistry and biology through molecular descriptors
Mannhold et al. Advanced computer-assisted techniques in drug discovery
Ellingson et al. Machine learning and ligand binding predictions: a review of data, methods, and obstacles
Marin-Sanguino et al. Biochemical pathway modeling tools for drug target detection in cancer and other complex diseases
Pan et al. Elements of computational systems biology
Ebhohimen et al. Advances in computer-aided drug discovery
Iossifov et al. Probabilistic inference of molecular networks from noisy data sources
Kesherwani et al. Conformational dynamics of thiM riboswitch to understand the gene regulation mechanism using Markov state modeling and the residual fluctuation network approach
WO2003040992A1 (fr) Procedes et systemes permettant d'identifier des composants de reseaux biochimiques mammaliens comme etant des cibles d'agents therapeutiques
Aksenov et al. An integrated approach for inference and mechanistic modeling for advancing drug development
Nordsletten et al. Multiscale mathematical modeling to support drug development
NZ545911A (en) Apparatus and method for identifying therapeutic targets using a computer model
Girisha et al. A comprehensive review of global alignment of multiple biological networks: background, applications and open issues
BS et al. Vitamin D analog calcitriol for breast cancer therapy; an integrated drug discovery approach

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20040602

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LI LU MC NL PT SE SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

A4 Supplementary search report drawn up and despatched

Effective date: 20050221

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20090113