WO2012168483A1 - Systems and methods for network-based biological activity assessment - Google Patents

Systems and methods for network-based biological activity assessment Download PDF

Info

Publication number
WO2012168483A1
WO2012168483A1 PCT/EP2012/061035 EP2012061035W WO2012168483A1 WO 2012168483 A1 WO2012168483 A1 WO 2012168483A1 EP 2012061035 W EP2012061035 W EP 2012061035W WO 2012168483 A1 WO2012168483 A1 WO 2012168483A1
Authority
WO
WIPO (PCT)
Prior art keywords
biological
network
score
computerized method
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2012/061035
Other languages
English (en)
French (fr)
Inventor
Julia HOENG
Florian Martin
Manuel Peitsch
Alain SEWER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Philip Morris Products SA
Original Assignee
Philip Morris Products SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Philip Morris Products SA filed Critical Philip Morris Products SA
Priority to EP12729448.6A priority Critical patent/EP2718880A1/en
Priority to JP2014514108A priority patent/JP6138768B2/ja
Priority to US14/124,826 priority patent/US20140172398A1/en
Priority to CN201280028435.6A priority patent/CN103827896B/zh
Publication of WO2012168483A1 publication Critical patent/WO2012168483A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/02Computing arrangements based on specific mathematical models using fuzzy logic
    • G06N7/06Simulation on general purpose computers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/30Dynamic-time models

Definitions

  • the human body is constantly perturbed by exposure to potentially harmful agents that can pose severe health risks in the long-term. Exposure to these agents can compromise the normal functioning of biological mechanisms internal to the human body. To understand and quantify the effect that these perturbations have on the human body, researchers study the mechanism by which biological systems respond to exposure to agents. Some groups have extensively utilized in vivo animal testing methods. However, animal testing methods are not always sufficient because there is doubt as to their reliability and relevance. Numerous differences exist in the physiology of different animals. Therefore, different species may respond differently to exposure to an agent. Accordingly, there is doubt as to whether responses obtained from animal testing may be extrapolated to human biology. Other methods include assessing risk through clinical studies of human volunteers.
  • systems and methods described herein are directed to computerized methods and one or more computer processors for quantifying the perturbation of a biological system in response to an agent.
  • the computerized method comprises, in one aspect, receiving, at a first processor, a set of treatment data corresponding to a response of a biological system to an agent, wherein the biological system includes or comprises a plurality of biological entities, each biological entity interacting with at least one other of the biological entities; receiving, at a second processor, a set of control data corresponding to the biological system not exposed to the agent; providing, at a third processor, a computational causal network model that represents the biological system and include or comprise: nodes representing the biological entities, edges representing relationships between the biological entities, and direction values, for the nodes, representing the expected direction of change between the control data and the treatment data; calculating, with a fourth processor, activity measures, for the nodes, representing a difference between the treatment data and the control data; calculating, with a fifth processor, weight values for the nodes, wherein at least one weight value is different from at least one other weight value; and generating, with a sixth processor, a score for the computational model representative of the perturbation of the biological system to the agent, wherein
  • the biological system may be represented by at least one mechanism hypothesis.
  • the biological system may be represented by a plurality of computational causal network models or at least one computational causal network model comprising a plurality of mechanism hypotheses.
  • the method may further comprise normalizing the score based on the number of measurable nodes in the respective computational model.
  • the weight values may represent a confidence in at least one of the set of treatment data and control data.
  • the weight values may include or comprise local false non-discovery rates.
  • the method may further comprise calculating, with a seventh processor, an approximate distribution of the activity measures of nodes over a model or a mechanism hypotheses in a model; calculating, with an eighth processor, an expected value of activity measures with respect to the approximate distribution; and generating, with a ninth processor, a score for each computational model representative of the perturbation of the subset of the biological system to the agent, wherein the score is based on expected value.
  • the approximate distribution may be based on the activity measures.
  • calculating an expected value may comprise performing a rectangular approximation.
  • the method may further comprise calculating, with a tenth processor, a positive activation metric and a negative activation metric based on the activity measures, the positive and negative activation metrics representative of consistency and inconsistency, respectively, between the activity measures and the direction values with respect to the model; and generating, with an eleventh processor, a score for each computational model representative of the perturbation of the subset of the biological system to the agent, wherein the score is based on the positive and negative activation scores.
  • the positive activation metric, negative activation metric or both may be based on local false non-discovery rates.
  • the activity measure may be a fold-change value, and the fold-change value for each node includes or comprises a logarithm of the difference between the treatment data and the control data for the biological entity represented by the respective node.
  • the subset of the biological system may include or comprise at least one of cell proliferation mechanism, cellular stress mechanism, cell inflammation mechanism, and DNA repair mechanism.
  • the agent may include or comprise at least one of aerosol generated by heating tobacco, aerosol generated by combusting tobacco, tobacco smoke or cigarette smoke.
  • the agent may include or comprise a heterogeneous substance, including a molecule or an entity that is not present in or derived from the biological system.
  • the agent may include or comprise toxins, therapeutic compounds, stimulants, relaxants, natural products, manufactured products, and food substances.
  • the set of treatment data may include or comprise a plurality of sets of treatment data such that each measurable node includes or comprises a plurality of fold-change values defined by a first probability distribution and a plurality of weight values defined by a second probability distribution.
  • the set of treatment data may include or comprise a plurality of sets of treatment data such that each measurable node include or comprise a plurality of fold-change values and the corresponding weight values.
  • the step of generating the score may comprise a linear or a non-linear combination of the activity measures, the weight values, and the direction values; and a normalization of the combination by a scale factor.
  • the combination may be an arithmetic combination, and the scale factor is the square root of the number of biological entities for which measured data are received.
  • the score may be generated by a geometric perturbation index scoring technique, a probabilistic perturbation index scoring technique, or an expected perturbation index scoring technique.
  • the method may further comprise determining a confidence interval for the score based on a parametric or non-parametric computational bootstrapping technique.
  • a computer system for quantifying the perturbation of a biological system in response to an agent comprises at least one processor configured or adapted to: receive a set of treatment data corresponding to a response of a biological system to an agent, wherein the biological system includes or comprises a plurality of biological entities, each biological entity interacting with at least one other of the biological entities; receive a set of control data corresponding to the biological system not exposed to the agent; provide a computational causal network model that represents the biological system and includes or comprises: nodes representing the biological entities, edges representing relationships between the biological entities, and direction values, for the nodes, representing the expected direction of change between the control data and the treatment data; calculate activity measures, for the nodes, representing a difference between the treatment data and the control data; calculate weight values for the nodes, wherein at least one weight value is different from at least one other weight value; and generate a score for the computational model representative of the perturbation of the biological system to the agent, wherein the score is based on the direction values, the
  • the biological system may be represented by at least one mechanism hypothesis.
  • the biological system may be represented by a plurality of computational causal network models or at least one computational causal network model comprising a plurality of mechanism hypotheses.
  • the computer system may further comprises normalizing the score based on the number of scorable nodes in the respective computational model.
  • the weight values may represent a confidence in at least one of the set of treatment data and control data.
  • the weight values may include or comprise local false non-discovery rates.
  • the computer system further comprises calculating an approximate distribution of the activity measures of nodes over a model or a mechanism hypotheses in a model; calculating, with an eighth processor, an expected value of activity measures with respect to the approximate distribution; and generating a score for each computational model representative of the perturbation of the subset of the biological system to the agent, wherein the score is based on expected value.
  • the approximate distribution may be based on the activity measures.
  • it may further comprise calculating an expected value comprises performing a rectangular approximation.
  • the system may further comprise calculating a positive activation metric and a negative activation metric based on the activity measures, the positive and negative activation metrics representative of consistency and inconsistency, respectively, between the activity measures and the direction values with respect to the model; and generating a score for each computational model representative of the perturbation of the subset of the biological system to the agent, wherein the score is based on the positive and negative activation scores.
  • the positive activation metric, negative activation metric or both may be based on local false non-discovery rates.
  • the activity measure may be a fold-change value, and the fold-change value for each node may include or comprise a logarithm of the difference between the treatment data and the control data for the biological entity represented by the respective node.
  • the subset of the biological system may include or comprise at least one of cell proliferation mechanism, cellular stress mechanism, cell inflammation mechanism, and DNA repair mechanism.
  • the agent may include or comprise at least one of aerosol generated by heating tobacco, aerosol generated by combusting tobacco, tobacco smoke or cigarette smoke.
  • the agent may include or comprise a heterogeneous substance, including a molecule or an entity that is not present in or derived from the biological system.
  • the agent may include or comprise toxins, therapeutic compounds, stimulants, relaxants, natural products, manufactured products, and food substances.
  • the set of treatment data may include or comprise a plurality of sets of treatment data such that each measurable node includes or comprises a plurality of fold-change values defined by a first probability distribution and a plurality of weight values defined by a second probability distribution.
  • the set of treatment data may include or comprise a plurality of sets of treatment data such that each measurable node includes or comprises a plurality of fold-change values and the corresponding weight values.
  • the step of generating the score may comprise a linear or a non-linear combination of the activity measures, the weight values, and the direction values; and a normalization of the combination by a scale factor.
  • the combination may be an arithmetic combination, and the scale factor is the square root of the number of biological entities for which measured data are received.
  • the score may be generated by a geometric perturbation index scoring technique, a probabilistic perturbation index scoring technique, or an expected perturbation index scoring technique.
  • the system may further comprise determining a confidence interval for the score based on a parametric or non-parametric computational bootstrapping technique.
  • the computerized method may comprise receiving, at a first processor, a set of treatment data corresponding to a response of a biological system to an agent, wherein the biological system includes a plurality of biological entities, each biological entity interacting with at least one other of the biological entities, and receiving, at a second processor, a set of control data corresponding to the biological system not exposed to the agent.
  • the computerized method may comprise providing, at a third processor, a computational causal network model that represents the biological system.
  • the computational model may include or comprise nodes representing the biological entities, edges representing relationships between the biological entities, and direction values, for the nodes, representing the expected direction of change between the control data and the treatment data.
  • the computerized method may further comprise calculating, with a fourth processor, activity measures, for the nodes, representing a difference between the treatment data and the control data, and calculating, with a fifth processor, weight values for the nodes, wherein at least one weight value is different from at least one other weight value.
  • the computerized method may also comprise generating, with a sixth processor, a score for the computational model representative of the perturbation of the biological system to the agent, wherein the score is based on the direction values, the weight values and the activity measures.
  • the computerized method further comprises normalizing the score based on the number of nodes in the respective computational model.
  • each of the first through sixth processors is included or comprised within a single processor or single computing device. In other implementations, one or more of the first through sixth processors are distributed across a plurality of processors or computing devices.
  • the computational causal network model includes or comprises a set of causal relationships that exist between a node representing a potential cause and nodes representing the measured quantities.
  • the activity measures may include a fold-change.
  • the fold-change may be a number describing how much a node measurement changes going from an initial value to a final value between control data and treatment data.
  • the fold-change number may represent the logarithm of the fold-change of the activity of the biological entity between control condition and treatment condition.
  • the activity measure for each node may include or comprise a logarithm of the difference between the treatment data and the control data for the biological entity represented by the respective node.
  • the weight value may represent a weight to be given to the fold-change value of the nodes.
  • the weight value may represent the known biological significance of the measured node with regard to a feature or outcome of interest (e.g., a known carcinogen in cancer studies).
  • the weight value may represent a confidence in at least one of the set of perturbation data and control data. More particularly, the weight values may include or comprise local false non-discovery rates.
  • the computerized method may generate the score for the computational model by multiplying the activity measure with the weight value and the direction value and summing over the nodes.
  • the computerized method includes or comprises generating, with a processor, a confidence interval for each of the generated scores. The confidence interval may comprise approximating a distribution of a generated score.
  • the systems and methods described herein are directed to computerized methods for quantifying the perturbation of a biological system in response to an agent.
  • the computerized method may comprise receiving, at a first processor, a set of treatment data corresponding to a response of a biological system to an agent, wherein the biological system includes or comprises a plurality of biological entities, each biological entity interacting with at least one other of the biological entities, and receiving, at a second processor, a set of control data corresponding to the biological system not exposed to the agent.
  • the computerized method may comprise providing, at a third processor, a computational causal network model that represents the biological system.
  • the computational model may include or comprise nodes representing the biological entities, edges representing relationships between the biological entities, and direction values, for the nodes, representing the expected direction of change between the control data and the treatment data.
  • the computerized method may further comprise calculating, with a fourth processor, activity measures, for the nodes, representing a difference between the treatment data and the control data, and calculating, with a fifth processor, an approximate distribution of the activity measures over the node.
  • the computerized method may also include or comprise calculating, with a sixth processor, an expected value of the approximate distribution.
  • the computerized method may also comprise generating, with a seventh processor, a score for each computational model representative of the perturbation of the subset of the biological system to the agent, wherein the score is based on the expected value.
  • each of the first through seventh processors is included or comprised within a single processor or single computing device. In other implementations, one or more of the first through seventh processors are distributed across a plurality of processors or computing devices.
  • the computational causal network model includes or comprises a set of causal relationships that exist between a node representing a potential cause and nodes representing the measured quantities.
  • the activity measures may include or comprise a fold-change.
  • the fold-change may be a number describing how much a node measurement changes going from an initial value to a final value between control data and treatment data.
  • the fold-change number may represent the logarithm of the fold-change of the activity of the biological entity between control condition and treatment condition.
  • the computerized method may include or comprise generating, with a processor, a range for the fold- change density, which may represent an approximation of the set of values that the fold-change values can take in the biological system under the treatment conditions.
  • the processor may generate an approximate fold-change density, which may include or comprise an approximate probability distribution of fold-change values.
  • the computerized method further includes or comprises calculating the approximate expected value of the approximate fold-change density.
  • the computerized method may generate the score for the computational model based on the calculated expected value.
  • the approximate distributions may be based, generally, on the activity measures. Additionally and optionally, the expected value may comprise a rectangular approximation.
  • the computerized method includes or comprises generating, with a processor, a confidence interval for each of the generated scores. Generating the confidence interval may comprise performing a parametric bootstrapping technique.
  • the systems and methods described herein are directed to computerized methods for quantifying the perturbation of a biological system in response to an agent.
  • the computerized method may comprise receiving, at a first processor, a set of treatment data corresponding to a response of a biological system to an agent, wherein the biological system includes or comprises a plurality of biological entities, each biological entity interacting with at least one other of the biological entities, and receiving, at a second processor, a set of control data corresponding to the biological system not exposed to the agent.
  • the computerized method may comprise providing, at a third processor, a computational causal network model that represents the biological system.
  • the computational model may include or comprise nodes representing the biological entities, edges representing relationships between the biological entities, and direction values, for the nodes, representing the expected direction of change between the control data and the treatment data.
  • the computerized method may further comprise calculating, with a fourth processor, activity measures, for the nodes, representing a difference between the treatment data and the control data, and calculating, with a fifth processor, a positive activation score and a negative activation score based on the activity measures, the positive and negative activation scores representative of consistency and inconsistency, respectively, between the activity measures and the direction values.
  • the computerized method may also comprise generating, with a sixth processor, a score for each computational model representative of the perturbation of the subset of the biological system to the agent, wherein the score is based on the positive and negative activation scores.
  • each of the first through sixth processors is included or comprised within a single processor or single computing device. In other implementations, one or more of the first through sixth processors are distributed across a plurality of processors or computing devices.
  • the computational causal network model includes or comprises a set of causal relationships that exist between a node representing a potential cause and nodes representing the measured quantities.
  • the activity measures may include or comprise a fold-change.
  • the fold-change may be a number describing how much a node measurement changes going from an initial value to a final value between control data and treatment data.
  • the fold-change number may represent the logarithm of the fold-change of the activity of the biological entity between control condition and treatment condition.
  • the computerized method may include or comprise generating, with a processor, a range for the fold- change density, which may represent an approximation of the set of values that the fold-change values can take in the biological system under the treatment conditions.
  • the computerized method may comprise calculating, with a processor, a positive activation score based on the fold- change values and the direction values.
  • the positive and negative activation scores may indicate whether the observed activation/inhibition of biological entities is consistent or inconsistent with the expected directions of change.
  • the positive activation score is a probability that the direction values are consistent with the activity measures.
  • the negative activation score may be a probability that the direction values are inconsistent with the activity measures.
  • the computerized method may further include or comprise generating a score for the computational model by combining the positive and negative activation scores. In certain implementations, the score is based on local false non-discovery rates.
  • the subset of the biological system includes or comprises at least one of cell proliferation mechanism, cellular stress mechanism, cell inflammation mechanism, and DNA repair mechanism.
  • the agent may include or comprise at least one of aerosol generated by heating tobacco, aerosol generated by combusting tobacco, tobacco smoke or cigarette smoke.
  • the agent may include cadmium, mercury, chromium, nicotine, tobacco-specific nitrosamines and their metabolites (4-(methylnitrosamino)-l -(3-pyridyl)-l-butanone (NNK), N'-nitrosonornicotine (NNN), N-nitrosoanatabine (NAT), N-nitrosoanabasine (NAB), and 4-(methylnitrosamino)-l-(3-pyridyl)-l-butanol (NNAL)).
  • the agent includes or comprises a product used for nicotine replacement therapy.
  • the agent may include or comprise a heterogeneous substance, including a molecule or an entity that is not present in or derived from the biological system.
  • the agent may also include or comprise toxins, therapeutic compounds, stimulants, relaxants, natural products, manufactured products, and food substances.
  • the set of treatment data includes or comprises a plurality of sets of treatment data corresponding to certain nodes of a biological network model, wherein each such node corresponds to a plurality of fold-change values defined by a first probability distribution and a plurality of weight values defined by a second probability distribution.
  • the systems and methods described herein are directed to computerized methods and one or more computer processors for quantifying the perturbation of a biological system in response to an agent.
  • the computerized method may comprise receiving, at a first processor, a set of treatment data corresponding to a response of a biological system to an agent, wherein the biological system includes or comprises a plurality of biological entities, each biological entity interacting with at least one other of the biological entities, and receiving, at a second processor, a set of control data corresponding to the biological system not exposed to the agent.
  • the computerized method may comprise providing, at a third processor, a computational causal network model that represents the biological system.
  • the computational model may include or comprise nodes representing the biological entities, edges representing relationships between the biological entities, and direction values, for the nodes, representing the expected direction of change between the control data and the treatment data.
  • the computerized method may further comprise calculating, with a fourth processor, activity measures, for the nodes, representing a difference between the treatment data and the control data.
  • the computerized method may also comprise generating, with a fifth processor, a score for the computational model representative of the perturbation of the biological system to the agent, wherein the score is based on the direction values and the activity measures.
  • the computerized method further comprises normalizing the score based on the number of nodes in the respective computational model.
  • the computerized method may also comprise generating, with a sixth processor, a confidence interval for each of the generated scores.
  • the confidence interval may comprise approximating a distribution of the generated scores and a t-statistic may be derived from the variance of the approximated distribution of generated scores.
  • each of the first through sixth processors is included or comprised within a single processor or single computing device. In other implementations, one or more of the first through sixth processors are distributed across a plurality of processors or computing devices.
  • the computerized methods described herein may be implemented in a computerized system having one or more computing devices, each including one or more processors.
  • the computerized systems described herein may comprise one or more engines, which include or comprise a processing device or devices, such as a computer, microprocessor, logic device or other device or processor that is configured with hardware, firmware, and software to carry out one or more of the computerized methods described herein.
  • the computerized system includes or comprises a systems response profile engine, a network modeling engine, and a network scoring engine.
  • the engines may be interconnected from time to time, and further connected from time to time to one or more databases, including a perturbations database, a measurables database, an experimental data database and a literature database.
  • the computerized system described herein may include or comprise a distributed computerized system having one or more processors and engines that communicate through a network interface. Such an implementation maybe appropriate for distributed computing over multiple communication systems.
  • a computer program product comprising a program code adapted to performed the method described herein.
  • a computer or computer recordable medium or device comprising the computer program product.
  • FIG. 1 is a block diagram of an exemplary computerized system for quantifying the response of a biological network to a perturbation.
  • FIG. 2 is a flow diagram of an exemplary process for quantifying the response of a biological network to a perturbation by calculating a network perturbation amplitude ( PA) score.
  • FIG. 3 is a graphical representation of data underlying a systems response profile comprising data for two agents, two parameters, N biological entities.
  • FIG. 4 is an illustration of a computational model of a biological network having several biological entities and their relationships.
  • FIG. 5 is a flow diagram of an exemplary process for generating a geometric perturbation index (GPI) score.
  • GPS geometric perturbation index
  • FIG. 6 is a flow diagram of an exemplary process for generating a probabilistic perturbation index (PPI) score.
  • PPI probabilistic perturbation index
  • FIG. 7 is a flow diagram of an exemplary process for generating an expected perturbation index (EPI) score.
  • EPI expected perturbation index
  • FIG. 8 is a flow diagram of an exemplary process for generating a confidence interval for a geometric perturbation index (GPI) score.
  • GPS geometric perturbation index
  • FIG. 9 illustrates a biological network model analyzed with the systems and methods disclosed herein.
  • FIGS. 10-14 illustrate network perturbation amplitude ( PA) scoring results for the network-based biological mechanisms.
  • FIG. 15 is a block diagram of an exemplary distributed computerized system for quantifying the impact of biological perturbations.
  • FIG. 16 is a block diagram of an exemplary computing device which may be used to implement any of the components in any of the computerized systems described herein.
  • the network model is used as a substrate for simulation and analysis, and is representative of the biological mechanisms and pathways that enable a feature of interest in the biological system.
  • the feature or some of its mechanisms and pathways may contribute to the pathology of diseases and adverse health effects of the biological system.
  • Prior knowledge of the biological system represented in a database is used to construct the network model which is populated by data on the status of numerous biological entities under various conditions including under normal conditions and under perturbation by an agent.
  • the network model used is dynamic in that it represents changes in status of various biological entities in response to a perturbation and can yield quantitative and objective assessments of the impact of an agent on the biological system. Computer systems for operating these computational methods are also provided.
  • the numerical values generated by computerized methods of the invention can be used to determine the magnitude of desirable or adverse biological effects caused by manufactured products (for safety assessment or comparisons), therapeutic compounds including nutrition supplements (for determination of efficacy or health benefits), and environmentally active substances (for prediction of risks of long term exposure and the relationship to adverse effect and onset of disease), among others.
  • the systems and methods described herein provide a computed numerical value representative of the magnitude of change in a perturbed biological system based on a network model of a perturbed biological mechanism.
  • the numerical value referred to herein as a network perturbation amplitude (NPA) score can be used to summarily represent the status changes of various entities in a defined biological mechanism.
  • NPA network perturbation amplitude
  • the numerical values obtained for different agents or different types of perturbations can be used to compare relatively the impact of the different agents or perturbations on a biological mechanism which enables or manifests itself as a feature of a biological system.
  • NPA scores may be used to measure the responses of a biological mechanism to different perturbations.
  • score is used herein generally to refer to a value or set of values which provide a quantitative measure of the magnitude of changes in a biological system. Such a score is computed by using any of various mathematical and computational algorithms known in the art and according to the methods disclosed herein, employing one or more datasets obtained from a sample or a subject.
  • the NPA scores may assist researchers and clinicians in improving diagnosis, experimental design, therapeutic decision, and risk assessment.
  • the NPA scores may be used to screen a set of candidate biological mechanisms in a toxicology analysis to identify those most likely to be affected by exposure to a potentially harmful agent.
  • these NPA scores may allow correlation of molecular events (as measured by experimental data) with phenotypes or biological outcomes that occur at the cell, tissue, organ or organism level.
  • a clinician may use NPA values to compare the biological mechanisms affected by an agent to a patient's physiological condition to determine what health risks or benefits the patient is most likely to experience when exposed to the agent (e.g., a patient who is immuno -compromised may be especially vulnerable to agents that cause a strong immuno-suppressive response).
  • FIG. 1 is a block diagram of a computerized system 100 for quantifying the response of a network model to a perturbation.
  • system 100 includes or comprises a systems response profile engine 1 10, a network modeling engine 1 12, and a network scoring engine 1 14.
  • the engines 1 10, 1 12, and 1 14 are interconnected from time to time, and further connected from time to time to one or more databases, including a perturbations database 102, a measurables database 104, an experimental data database 106 and a literature database 108.
  • an engine includes or comprises a processing device or devices, such as a computer, microprocessor, logic device or other device or devices as described with reference to FIG. 14, that is configured with hardware, firmware, and software to carry out one or more computational operations.
  • FIG. 2 is a flow diagram of a process 200 for quantifying the response of a biological network to a perturbation by calculating a network perturbation amplitude (NPA) score, according to one implementation.
  • the steps of the process 200 will be described as being carried out by various components of the system 100 of FIG. 1 , but any of these steps may be performed by any suitable hardware or software components, local or remote, and may be arranged in any appropriate order or performed in parallel.
  • the systems response profile (SRP) engine 1 10 receives biological data from a variety of different sources, and the data itself may be of a variety of different types.
  • the data comprises data from experiments in which a biological system is perturbed, as well as control data.
  • the SRP engine 1 10 generates systems response profiles (SRPs) which are representations of the degree to which one or more entities within a biological system change in response to the presentation of an agent to the biological system.
  • SRPs systems response profiles
  • the network modeling engine 1 12 provides one or more databases that contain(s) a plurality of network models, one of which is selected as being relevant to the agent or a feature of interest. The selection can be made on the basis of prior knowledge of the mechanisms underlying the biological functions of the system.
  • the network modeling engine 112 may extract causal relationships between entities within the system using the systems response profiles, networks in the database, and networks previously described in the literature, thereby generating, refining or extending a network model.
  • the network scoring engine 114 generates NPA scores for each perturbation using the network identified at step 214 by the network modeling engine 112 and the SRPs generated at step 212 by the SRP engine 110.
  • An NPA score quantifies a biological response to a perturbation or treatment (represented by the SRPs) in the context of the underlying relationships between the biological entities (represented by the network). The following description is divided into subsections for clarity of disclosure, and not by way of limitation.
  • a biological system in the context of the present invention is an organism or a part of an organism, including functional parts, the organism being referred to herein as a subject.
  • the subject is generally a mammal, including a human.
  • the subject can be an individual human being in a human population.
  • the term "mammal” as used herein includes or comprises but is not limited to a human, non-human primate, mouse, rat, dog, cat, cow, sheep, horse, and pig. Mammals other than humans can be advantageously used as subjects that can be used to provide a model of a human disease.
  • the non-human subject can be unmodified, a transgenic animal, a genetically modified animal, or an animal carrying one or more genetic mutation(s), or silenced gene(s).
  • a subject can be male or female. Depending on the objective of the operation, a subject can be one that has been exposed to an agent of interest. A subject can be one that has been exposed to an agent over an extended period of time, optionally including time prior to the study. A subject can be one that had been exposed to an agent for a period of time but is no longer in contact with the agent. A subject can be one that has been diagnosed or identified as having a disease. A subject can be one who has already undergone, or is undergoing treatment of a disease or adverse health condition. A subject can also be one who exhibits one or more symptoms or risk factors for a specific health condition or disease. A subject can be one that is predisposed to but is asymptomatic for a disease.
  • the disease or health condition in question is associated with exposure to an agent or use of an agent over an extended period of time.
  • the system 100 contains or generates computerized models of one or more biological systems and mechanisms of its functions (collectively, “biological networks” or “network models”) that are relevant to a type of perturbation or an outcome of interest.
  • the biological system can be defined at different levels as it relates to the function of an individual organism in a population, an organism generally, an organ, a tissue, a cell type, an organelle, a cellular component, or a specific individual's cell(s).
  • Each biological system comprises one or more biological mechanisms or pathways, the operation of which manifest as functional features of the system.
  • Animal systems that reproduce defined features of a human health condition and that are suitable for exposure to an agent of interest are preferred biological systems.
  • Cellular and organotypical systems that reflect the cell types and tissue involved in a disease etiology or pathology are also preferred biological systems. Priority could be given to primary cells or organ cultures that recapitulate as much as possible the human biology in vivo.
  • the biological system contemplated for use with the systems and methods described herein can be defined by, without limitation, functional features (biological functions, physiological functions, or cellular functions), organelle, cell type, tissue type, organ, development stage, or a combination of the foregoing.
  • biological systems include or comprise, but are not limited to, the pulmonary, integument, skeletal, muscular, nervous (central and peripheral), endocrine, cardiovascular, immune, circulatory, respiratory, urinary, renal, gastrointestinal, colorectal, hepatic and reproductive systems.
  • biological systems include or comprise, but are not limited to, the various cellular functions in epithelial cells, nerve cells, blood cells, connective tissue cells, smooth muscle cells, skeletal muscle cells, fat cells, ovum cells, sperm cells, stem cells, lung cells, brain cells, cardiac cells, laryngeal cells, pharyngeal cells, esophageal cells, stomach cells, kidney cells, liver cells, breast cells, prostate cells, pancreatic cells, islet cells, testes cells, bladder cells, cervical cells, uterus cells, colon cells, and rectum cells.
  • Some of the cells may be cells of cell lines, cultured in vitro or maintained in vitro indefinitely under appropriate culture conditions.
  • Examples of cellular functions include or comprise, but are not limited to, cell proliferation (e.g., cell division), degeneration, regeneration, senescence, control of cellular activity by the nucleus, cell-to-cell signaling, cell differentiation, cell de-differentiation, secretion, migration, phagocytosis, repair, apoptosis, and developmental programming.
  • Examples of cellular components that can be considered as biological systems include or comprise, but are not limited to, the cytoplasm, cytoskeleton, membrane, ribosomes, mitochondria, nucleus, endoplasmic reticulum (ER), Golgi apparatus, lysosomes, DNA, RNA, proteins, peptides, and antibodies.
  • a perturbation in a biological system can be caused by one or more agents over a period of time through exposure or contact with one or more parts of the biological system.
  • An agent can be a single substance or a mixture of substances, including a mixture in which not all constituents are identified or characterized. The chemical and physical properties of an agent or its constituents may not be fully characterized.
  • An agent can be defined by its structure, its constituents, or a source that under certain conditions produces the agent.
  • An example of an agent is a heterogeneous substance, that is a molecule or an entity that is not present in or derived from the biological system, and any intermediates or metabolites produced therefrom after contacting the biological system.
  • An agent can be a carbohydrate, protein, lipid, nucleic acid, alkaloid, vitamin, metal, heavy metal, mineral, oxygen, ion, enzyme, hormone, neurotransmitter, inorganic chemical compound, organic chemical compound, environmental agent, microorganism, particle, environmental condition, environmental force, or physical force.
  • agents include or comprise but are not limited to nutrients, metabolic wastes, poisons, narcotics, toxins, therapeutic compounds, stimulants, relaxants, natural products, manufactured products, food substances, pathogens (prion, virus, bacteria, fungi, protozoa), particles or entities whose dimensions are in or below the micrometer range, byproducts of the foregoing and mixtures of the foregoing.
  • Non-limiting examples of a physical agent include or comprise radiation, electromagnetic waves (including sunlight), increase or decrease in temperature, shear force, fluid pressure, electrical discharge(s) or a sequence thereof, or trauma.
  • Some agents may not perturb a biological system unless it is present at a threshold concentration or it is in contact with the biological system for a period of time, or a combination of both. Exposure or contact of an agent resulting in a perturbation may be quantified in terms of dosage. Thus, a perturbation can result from a long-term exposure to an agent. The period of exposure can be expressed by units of time, by frequency of exposure, or by the percentage of time within the actual or estimated life span of the subject. A perturbation can also be caused by withholding an agent (as described above) from or limiting supply of an agent to one or more parts of a biological system.
  • a perturbation can be caused by a decreased supply of or a lack of nutrients, water, carbohydrates, proteins, lipids, alkaloids, vitamins, minerals, oxygen, ions, an enzyme, a hormone, a neurotransmitter, an antibody, a cytokine, light, or by restricting movement of certain parts of an organism, or by constraining or requiring exercise.
  • An agent may cause different perturbations depending on which part(s) of the biological system is exposed and the exposure conditions.
  • Non-limiting examples of an agent may include or comprise aerosol generated by heating tobacco, aerosol generated by combusting tobacco, tobacco smoke or cigarette smoke, and any of the gaseous constituents or particulate constituents thereof.
  • an agent include or comprise cadmium, mercury, chromium, nicotine, tobacco -specific nitrosamines and their metabolites (4-(methylnitrosamino)- l-(3-pyridyl)-l-butanone (NNK), N'-nitrosonornicotine (NNN), N-nitrosoanatabine (NAT), N- nitrosoanabasine (NAB), 4-(methylnitrosamino)-l-(3-pyridyl)-l-butanol (NNAL)), and any product used for nicotine replacement therapy.
  • An exposure regimen for an agent or complex stimulus should reflect the range and circumstances of exposure in everyday settings.
  • a set of standard exposure regimens can be designed to be applied systematically to equally well-defined experimental systems. Each assay could be designed to collect time and dose-dependent data to capture both early and late events and ensure a representative dose range is covered.
  • Each assay could be designed to collect time and dose-dependent data to capture both early and late events and ensure a representative dose range is covered.
  • high-throughput system-wide measurements for gene expression, protein expression or turnover, microRNA expression or turnover, post-translational modifications, protein modifications, translocations, antibody production metabolite profiles, or a combination of two or more of the foregoing are generated under various conditions including the respective controls.
  • Functional outcome measurements are desirable in the methods described herein as they can generally serve as anchors for the assessment and represent clear steps in a disease etiology.
  • sample refers to any biological sample that is isolated from a subject or an experimental system (e.g., cell, tissue, organ, or whole animal).
  • a sample can include or comprise, without limitation, a single cell or multiple cells, cellular fraction, tissue biopsy, resected tissue, tissue extract, tissue, tissue culture extract, tissue culture medium, exhaled gases, whole blood, platelets, serum, plasma, erythrocytes, leucocytes, lymphocytes, neutrophils, macrophages, B cells or a subset thereof, T cells or a subset thereof, a subset of hematopoietic cells, endothelial cells, synovial fluid, lymphatic fluid, ascites fluid, interstitial fluid, bone marrow, cerebrospinal fluid, pleural effusions, tumor infiltrates, saliva, mucous, sputum, semen, sweat, urine, or any other bodily fluids.
  • Samples can be obtained from a subject by means including but
  • the system 100 can generate a network amplitude ( PA) value, which is a quantitative measure of changes in the status of biological entities in a network in response to a treatment condition.
  • PA network amplitude
  • the system 100 comprises one or more computerized network model(s) that are relevant to the health condition, disease, or biological outcome, of interest.
  • One or more of these network models are based on prior biological knowledge and can be uploaded from an external source and curated within the system 100.
  • the models can also be generated de novo within the system 100 based on measurements.
  • Measurable elements are causally integrated into biological network models through the use of prior knowledge. Described below are the types of data that represent changes in a biological system of interest that can be used to generate or refine a network model, or that represent a response to a perturbation.
  • the systems response profile (SRP) engine 1 10 receives biological data.
  • the SRP engine 110 may receive this data from a variety of different sources, and the data itself may be of a variety of different types.
  • the biological data used by the SRP engine 1 10 may be drawn from the literature, databases (including data from preclinical, clinical and post-clinical trials of pharmaceutical products or medical devices), genome databases (genomic sequences and expression data, e.g., Gene Expression Omnibus by National Center for Biotechnology Information or ArrayExpress by European Bioinformatics Institute (Parkinson et al. 2010, Nucl. Acids Res., doi: 10.1093/nar/gkql040.
  • Pubmed ID 21071405) may include or comprise raw data from one or more different sources, such as in vitro, ex vivo or in vivo experiments using one or more species that are specifically designed for studying the effect of particular treatment conditions or exposure to particular agents.
  • In vitro experimental systems may include or comprise tissue cultures or organotypical cultures (three-dimensional cultures) that represent key aspects of human disease.
  • the agent dosage and exposure regimens for these experiments may substantially reflect the range and circumstances of exposures that may be anticipated for humans during normal use or activity conditions, or during special use or activity conditions.
  • Experimental parameters and test conditions may be selected as desired to reflect the nature of the agent and the exposure conditions, molecules and pathways of the biological system in question, cell types and tissues involved, the outcome of interest, and aspects of disease etiology.
  • Particular animal-model- derived molecules, cells or tissues may be matched with particular human molecule, cell or tissue cultures to improve translatability of animal-based findings.
  • the data received by SRP engine 110 many of which are generated by high-throughput experimental techniques, include or comprise but are not limited to that relating to nucleic acid (e.g., absolute or relative quantities of specific DNA or RNA species, changes in DNA sequence, R A sequence, changes in tertiary structure, or methylation pattern as determined by sequencing, hybridization - particularly to nucleic acids on microarray, quantitative polymerase chain reaction, or other techniques known in the art), protein/peptide (e.g., absolute or relative quantities of protein, specific fragments of a protein, peptides, changes in secondary or tertiary structure, or posttranslational modifications as determined by methods known in the art) and functional activities (e.g., enzymatic activities, proteolytic activities, transcriptional regulatory activities, transport activities, binding affinities to certain binding partners) under certain conditions, among others.
  • nucleic acid e.g., absolute or relative quantities of specific DNA or RNA species, changes in DNA sequence, R A sequence, changes in tertiary structure
  • Modifications including posttranslational modifications of protein or peptide can include or comprise, but are not limited to, methylation, acetylation, farnesylation, biotinylation, stearoylation, formylation, myristoylation, palmitoylation, geranylgeranylation, pegylation, phosphorylation, sulphation, glycosylation, sugar modification, lipidation, lipid modification, ubiquitination, sumolation, disulphide bonding, cysteinylation, oxidation, glutathionylation, carboxylation, glucuronidation, and deamidation.
  • a protein can be modified posttranslationally by a series of reactions such as Amadori reactions, Schiff base reactions, and Maillard reactions resulting in glycated protein products.
  • the data may also include or comprise measured functional outcomes, such as but not limited to those at a cellular level including cell proliferation, developmental fate, and cell death, at a physiological level, lung capacity, blood pressure, exercise proficiency.
  • the data may also include or comprise a measure of disease activity or severity, such as but not limited to tumor metastasis, tumor remission, loss of a function, and life expectancy at a certain stage of disease.
  • Disease activity can be measured by a clinical assessment the result of which is a value, or a set of values that can be obtained from evaluation of a sample (or population of samples) from a subject or subjects under defined conditions.
  • a clinical assessment can also be based on the responses provided by a subject to an interview or a questionnaire.
  • This data may have been generated expressly for use in determining a systems response profile, or may have been produced in previous experiments or published in the literature.
  • the data includes or comprises information relating to a molecule, biological structure, physiological condition, genetic trait, or phenotype.
  • the data includes or comprises a description of the condition, location, amount, activity, or substructure of a molecule, biological structure, physiological condition, genetic trait, or phenotype.
  • the data may include or comprise raw or processed data obtained from assays performed on samples obtained from human subjects or observations on the human subjects, exposed to an agent.
  • the systems response profile (SRP) engine 1 10 generates systems response profiles (SRPs) based on the biological data received at step 212.
  • This step may include or comprise one or more of background correction, normalization, fold-change calculation, significance determination and identification of a differential response (e.g., differentially expressed genes).
  • SRPs are representations that express the degree to which one or more measured entities within a biological system (e.g., a molecule, a nucleic acid, a peptide, a protein, a cell, etc.) are individually changed in response to a perturbation applied to the biological system (e.g., an exposure to an agent).
  • the SRP engine 1 10 collects a set of measurements for a given set of parameters (e.g., treatment or perturbation conditions) applied to a given experimental system (a "system-treatment" pair).
  • FIG. 3 illustrates two SRPs: SRP 302 that includes or comprises biological activity data for N different biological entities undergoing a first treatment 306 with varying parameters (e.g., dose and time of exposure to a first treatment agent), and an analogous SRP 304 that includes or comprises biological activity data for the N different biological entities undergoing a second treatment 308.
  • the data included or comprised in an SRP may be raw experimental data, processed experimental data (e.g., filtered to remove outliers, marked with confidence estimates, averaged over a number of trials), data generated by a computational biological model, or data taken from the scientific literature.
  • An SRP may represent data in any number of ways, such as an absolute value, an absolute change, a fold-change, a logarithmic change, a function, and a table.
  • the SRP engine 1 10 passes the SRPs to the network modeling engine 1 12.
  • SRPs derived in the previous step represent the experimental data from which the magnitude of network perturbation will be determined
  • biological network models that are the substrate for computation and analysis.
  • This analysis requires development of a detailed network model of the mechanisms and pathways relevant to a feature of the biological system.
  • Such a framework provides a layer of mechanistic understanding beyond examination of gene lists that have been used in more classical gene expression analysis.
  • a network model of a biological system is a mathematical construct that is representative of a dynamic biological system and that is built by assembling quantitative information about various basic properties of the biological system.
  • Construction of such a network is an iterative process. Delineation of boundaries of the network is guided by literature investigation of mechanisms and pathways relevant to the process of interest (e.g., cell proliferation in the lung). Causal relationships describing these pathways are extracted from prior knowledge to nucleate a network.
  • the literature-based network can be verified using high-throughput data sets that contain the relevant phenotypic endpoints.
  • SRP engine 1 10 can be used to analyze the data sets, the results of which can be used to confirm, refine, or generate network models.
  • the network modeling engine 1 12 uses the systems response profiles from the SRP engine 1 10 with a network model based on the mechanism(s) or pathway(s) underlying a feature of a biological system of interest.
  • the network modeling engine 1 12 is used to identify networks already generated based on SRPs.
  • the network modeling engine 1 12 may include or comprise components for receiving updates and changes to models.
  • the network modeling engine 1 12 may also iterate the process of network generation, incorporating new data and generating additional or refined network models.
  • the network modeling engine 1 12 may also facilitate the merging of one or more datasets or the merging of one or more networks.
  • the set of networks drawn from a database may be manually supplemented by additional nodes, edges, or entirely new networks (e.g., by mining the text of literature for description of additional genes directly regulated by a particular biological entity). These networks contain features that may enable process scoring. Network topology is maintained; networks of causal relationships can be traced from any point in the network to a measurable entity. Further, the models are dynamic and the assumptions used to build them can be modified or restated and enable adaptability to different tissue contexts and species. This allows for iterative testing and improvement as new knowledge becomes available.
  • the network modeling engine 1 12 may remove nodes or edges that have low confidence or which are the subject of conflicting experimental results in the scientific literature.
  • the network modeling engine 1 12 may also include or comprise additional nodes or edges that may be inferred using supervised or unsupervised learning methods (e.g., metric learning, matrix completion, pattern recognition).
  • a biological system is modeled as a mathematical graph consisting of vertices (or nodes) and edges that connect the nodes.
  • FIG. 4 illustrates a simple network 400 with 9 nodes (including nodes 402 and 404) and edges (406 and 408).
  • the nodes can represent biological entities or processes within a biological system, such as, but not limited to, compounds, DNA, RNA, genes, proteins, peptides, antibodies, cells, tissues, organs and cellular or molecular processes.
  • the biological entities are not necessarily limited to those biological entities for which treatment or control data are received or available.
  • the nodes representing the biological entities can include or comprise the plurality of biological entities and may include or comprise one or more further biological entities.
  • At least some of the nodes are scorable and the score may represent the activity level of the node(s). Many of the nodes represent biological entities of which the activity levels can be measured. However, in some implantations, it is not necessary for the computerized method to receive data for all such measurable nodes. Thus, the nodes are scorable and/or measurable. In certain implementations, most of the nodes are measurable. A measurable node may contain or comprise measured data.
  • the edges can represent relationships between the nodes. The edges in the graph can represent various relations between the nodes.
  • edges may represent a "binds to" relation, an "is expressed in” relation, an “are co-regulated based on expression profiling” relation, an “inhibits” relation, a "co-occur in a manuscript” relation, or “share structural element” relation.
  • these types of relationships describe a relationship between a pair of nodes.
  • the nodes in the graph can also represent relationships between nodes.
  • a relationship between two nodes that represent chemicals may represent a reaction. This reaction may be a node in a relationship between the reaction and a chemical that inhibits the reaction.
  • a graph may be undirected, meaning that there is no distinction between the two vertices associated with each edge.
  • the edges of a graph may be directed from one vertex to another.
  • transcriptional regulatory networks and metabolic networks may be modeled as a directed graph.
  • nodes would represent genes with edges denoting the transcriptional relationships between them.
  • protein-protein interaction networks describe direct physical interactions between the proteins in an organism's proteome and there is often no direction associated with the interactions in such networks. Thus, these networks may be modeled as undirected graphs. Certain networks may have both directed and undirected edges.
  • the entities and relationships (i.e., the nodes and edges) that make up a graph may be stored as a web of interrelated nodes in a database in system 100.
  • the knowledge represented within the database may be of various different types, drawn from various different sources.
  • certain data may represent a genomic database, including information on genes, and relations between them.
  • a node may represent an oncogene
  • another node connected to the oncogene node may represent a gene that inhibits the oncogene.
  • the data may represent proteins, and relations between them, diseases and their interrelations, and various disease states.
  • the computational models may represent a web of relations between nodes representing knowledge in, e.g., a DNA dataset, an R A dataset, a protein dataset, an antibody dataset, a cell dataset, a tissue dataset, an organ dataset, a medical dataset, an epidemiology dataset, a chemistry dataset, a toxicology dataset, a patient dataset, and a population dataset.
  • a dataset is a collection of numerical values resulting from evaluation of a sample (or a group of samples) under defined conditions. Datasets can be obtained, for example, by experimentally measuring quantifiable entities of the sample; or alternatively, or from a service provider such as a laboratory, a clinical research organization, or from a public or proprietary database.
  • Datasets may contain data and biological entities represented by nodes, and the nodes in each of the datasets may be related to other nodes in the same dataset, or in other datasets.
  • the network modeling engine 112 may generate computational models that represent genetic information, in, e.g., DNA, R A, protein or antibody dataset, to medical information, in medical dataset, to information on individual patients in patient dataset, and on entire populations, in epidemiology dataset.
  • genetic information in, e.g., DNA, R A, protein or antibody dataset
  • a database could further include or comprise medical record data, structure/activity relationship data, information on infectious pathology, information on clinical trials, exposure pattern data, data relating to the history of use of a product, and any other type of life science-related information.
  • the network modeling engine 112 may generate one or more network models representing, for example, the regulatory interaction between genes, interaction between proteins or complex bio-chemical interactions within a cell or tissue.
  • the networks generated by the network modeling engine 112 may include or comprise static and dynamic models.
  • the network modeling engine 112 may employ any applicable mathematical schemes to represent the system, such as hyper-graphs and weighted bipartite graphs, in which two types of nodes are used to represent reactions and compounds.
  • the network modeling engine 112 may also use other inference techniques to generate network models, such as an analysis based on over- representation of functionally-related genes within the differentially expressed genes, Bayesian network analysis, a graphical Gaussian model technique or a gene relevance network technique, to identify a relevant biological network based on a set of experimental data ⁇ e.g., gene expression, metabolite concentrations, cell response, etc.).
  • the biological system may be represented by a plurality of network models, including computational causal network models.
  • the network model is based on mechanisms and pathways that underlie the functional features of a biological system.
  • the network modeling engine 1 12 may generate or contain a model representative of an outcome regarding a feature of the biological system that is relevant to the study of the long-term health risks or health benefits of agents. Accordingly, the network modeling engine 1 12 may generate or contain a network model for various mechanisms of cellular function, particularly those that relate or contribute to a feature of interest in the biological system, including but not limited to cellular proliferation, cellular stress, cellular regeneration, apoptosis, DNA damage/repair or inflammatory response.
  • the network modeling engine 1 12 may contain or generate computational models that are relevant to acute systemic toxicity, carcinogenicity, dermal penetration, cardiovascular disease, pulmonary disease, ecotoxicity, eye irrigation/corrosion, genotoxicity, immunotoxicity, neurotoxicity, pharmacokinetics, drug metabolism, organ toxicity, reproductive and developmental toxicity, skin irritation/corrosion or skin sensitization.
  • the network modeling engine 1 12 may contain or generate computational models for status of nucleic acids (DNA, RNA, SNP, siRNA, miRNA, RNAi), proteins, peptides, antibodies, cells, tissues, organs, and any other biological entity, and their respective interactions.
  • computational network models can be used to represent the status of the immune system and the functioning of various types of white blood cells during an immune response or an inflammatory reaction.
  • computational network models could be used to represent the performance of the cardiovascular system and the functioning and metabolism of endothelial cells.
  • the network is drawn from a database of causal biological knowledge.
  • This database may be generated by performing experimental studies of different biological mechanisms to extract relationships between mechanisms (e.g., activation or inhibition relationships), some of which may be causal relationships, and may be combined with a commercially-available database such as the Genstruct Technology Platform or the Selventa Knowledgebase, curated by Selventa Inc. of Cambridge, Massachusetts, USA.
  • the network modeling engine 1 12 may identify a network that links the perturbations 102 and the measurables 104.
  • the network modeling engine 1 12 extracts causal relationships between biological entities using the systems response profiles from the SRP engine 1 10 and networks previously generated in the literature.
  • the database may be further processed to remove logical inconsistencies and generate new biological knowledge by applying homologous reasoning between different sets of biological entities, among other processing steps.
  • the network model extracted from the database is based on reverse causal reasoning (RCR), an automated reasoning technique that processes networks of causal relationships to formulate mechanism hypotheses, and then evaluates those mechanism hypotheses against datasets of differential measurements.
  • RCR reverse causal reasoning
  • Each mechanism hypothesis links a biological entity to measurable quantities that it can influence.
  • At least one mechanism hypothesis may be formulated - such as a plurality of mechanism hypotheses.
  • measurable quantities can include or comprise an increase or decrease in concentration, number or relative abundance of a biological entity, activation or inhibition of a biological entity, or changes in the structure, function or logical of a biological entity, among others.
  • RCR uses a directed network of experimentally-observed causal interactions between biological entities as a substrate for computation.
  • the directed network may be expressed in Biological Expression LanguageTM (BELTM), a syntax for recording the inter-relationships between biological entities.
  • BELTM Biological Expression Language
  • the RCR computation specifies certain constraints for network model generation, such as but not limited to path length (the maximum number of edges connecting an upstream node and downstream nodes), and possible causal paths that connect the upstream node to downstream nodes.
  • path length the maximum number of edges connecting an upstream node and downstream nodes
  • the output of RCR is a set of mechanism hypotheses that represent upstream controllers of the differences in experimental measurements, ranked by statistics that evaluate relevance and accuracy.
  • the mechanism hypotheses output can be assembled into causal chains and larger networks to interpret the dataset at a higher level of interconnected mechanisms and pathways.
  • One type of mechanism hypothesis comprises a set of causal relationships that exist between a node representing a potential cause (the upstream node or controller) and nodes representing the measured quantities (the downstream nodes).
  • the mechanism hypothesis can be used to make predictions, such as if the abundance of an entity represented by an upstream node increases, the downstream nodes linked by causal increase relationships would be inferred to be increase, and the downstream nodes linked by causal decrease relationships would be inferred to decrease.
  • a mechanism hypothesis represents the relationships between a set of measured data, for example, gene expression data, and a biological entity that is a known controller of those genes. Additionally, these relationships include or comprise the sign (positive or negative) of influence between the upstream entity and the differential expression of the downstream genes.
  • the downstream genes of a hypothesis are drawn from a database of literature-curated causal biological knowledge.
  • the causal relationships of a mechanism hypothesis that link the upstream entity to downstream genes, in the form of a computable causal network model, are the substrate for the calculation of network changes by the NPA scoring methods.
  • the biological system may be represented by at least one mechanism hypothesis - such as a plurality of mechanism hypotheses.
  • the at least one computational causal network model may comprise a plurality of mechanism hypotheses.
  • a scorable complex causal network model of biological entities can be transformed into a single causal network model by collecting the individual mechanism hypothesis representing entities in the model and regrouping the connections of all the downstream genes to a single upstream process representing the whole complex causal network model; this in essence is a flattening of the underlying graph structure.
  • the activity changes of the biological entities described by the network model can be assessed via combination of its individual mechanism hypotheses, such that the underlying gene expression measurements contribute to the network as a whole.
  • a reference node is first selected from a starting, typically complex, causal network model.
  • the reference node can be any entity in the network whose level or activity is positively related to the activity of the network as a whole (as opposed to, for example, and inhibitor whose activity may be negatively related to the network activity).
  • the causal relationship between each node in the model and the reference node is determined. This can be done by first requiring that the model be "causally consistent”.
  • the signs of regulation of downstream measurable entities in this example, gene expressions
  • the signs of the downstream gene expressions for a model node that has a positive causal relationship with the reference node are maintained.
  • the signs of the downstream gene expressions for a model node with a negative causal relationship with the reference node i.e., that node is expected to be negatively regulated when the reference node increases
  • All the downstream gene expressions and their signs are then assembled into a single mechanism hypothesis, and downstream gene expressions with contradictory signs (from multiple model nodes) are omitted from the mechanism hypothesis.
  • connection between the negative feedback loop and this node can be removed from the model to obtain causal consistency in a manner that is congruent with known facts.
  • Variations on the approach described above are discussed in U.S. Patent Application Publication No. 2007/0225956 and 2009/0099784, which are incorporated by reference herein in their entirety.
  • An exemplary causal network model is described in Westra JW, Schlage WK, Frushour BP, Gebel S, Catlett NL, Han W, Eddy SF, Hengstermann A, Matthews AL, Mathis C, et al: Construction of a Computable Cell Proliferation Network Focused on Non-Diseased Lung Cells. BMC Syst Biol 201 1 , 5: 105, which is incorporated by reference herein in its entirety.
  • the system 100 may contain or generate a computerized model for the mechanism of cell proliferation when the cells have been exposed to cigarette smoke.
  • the system 100 may also contain or generate one or more network models representative of the various health conditions relevant to cigarette smoke exposure, including but not limited to cancer, pulmonary diseases and cardiovascular diseases.
  • these network models are based on at least one of the perturbations applied (e.g., exposure to an agent), the responses under various conditions, the measureable quantities of interest, the outcome being studied (e.g., cell proliferation, cellular stress, inflammation, DNA repair), experimental data, clinical data, epidemiological data, and literature.
  • the network modeling engine 1 12 may be configured for generating a network model of cellular stress.
  • the network modeling engine 1 12 may receive networks describing relevant mechanisms involved in the stress response known from literature databases.
  • the network modeling engine 1 12 may select one or more networks based on the biological mechanisms known to operate in response to stresses in pulmonary and cardiovascular contexts.
  • the network modeling engine 112 identifies one or more functional units within a biological system and builds a larger network model by combining smaller networks based on their functionality.
  • the network modeling engine 112 may consider functional units relating to responses to oxidative, genotoxic, hypoxic, osmotic, xenobiotic, and shear stresses.
  • the network components for a cellular stress model may include or comprise xenobiotic metabolism response, genotoxic stress, endothelial shear stress, hypoxic response, osmotic stress and oxidative stress.
  • the network modeling engine 112 may also receive content from computational analysis of publicly available transcriptomic data from stress relevant experiments performed in a particular group of cells.
  • the network modeling engine 112 may include or comprise one or more rules. Such rules may include or comprise rules for selecting network content, types of nodes, and the like.
  • the network modeling engine 112 may select one or more data sets from experimental data database 106, including a combination of in vitro and in vivo experimental results.
  • the network modeling engine 112 may utilize the experimental data to verify nodes and edges identified in the literature.
  • the network modeling engine 112 may select data sets for experiments based on how well the experiment represented physiologically-relevant stress in non-diseased lung or cardiovascular tissue. The selection of data sets may be based on the availability of phenotypic stress endpoint data, the statistical rigor of the gene expression profiling experiments, and the relevance of the experimental context to normal non-diseased lung or cardiovascular biology, for example.
  • the network modeling engine 112 may further process and refine those networks. For example, in some implementations, multiple biological entities and their connections may be grouped and represented by a new node or nodes (e.g., using clustering or other techniques).
  • the network modeling engine 112 may further include or comprise descriptive information regarding the nodes and edges in the identified networks.
  • a node may be described by its associated biological entity, an indication of whether or not the associated biological entity is a measurable quantity, or any other descriptor of the biological entity.
  • Some of the nodes are scorable and the score may represent the activity level of the node(s).
  • Many of the nodes represent biological entities of which the activity levels can be measured. However, in some implantations, it is not necessary for the computerized method to receive data for all such measurable nodes. Thus, the nodes are scorable and/or measurable. In certain implementations, most of the nodes are measurable.
  • a measurable node may contain or comprise measured data..
  • An edge may be described by the type of relationship it represents (e.g., a causal relationship such as an up-regulation or a down-regulation, a correlation, a conditional dependence or independence), the strength of that relationship, or a statistical confidence in that relationship, for example.
  • each node that represents a measureable entity is associated with an expected direction of activity change (i.e., an increase or decrease) in response to the treatment. For example, when a bronchial epithelial cell is exposed to an agent such as tumor necrosis factor (TNF), the activity of a particular gene may increase.
  • TNF tumor necrosis factor
  • the network modeling engine 1 12 may identify an expected direction of change, in response to a particular perturbation, for each of the measureable entities.
  • the two pathways may be examined in more detail to determine the net direction of change, or measurements of that particular entity may be discarded.
  • direction values, for the nodes may represent the expected direction of change between the control data and the treatment data.
  • direction values, for the nodes may represent the expected change in value between the control data and the treatment data.
  • direction values, for the nodes may represent the expected increase or decrease in value of the control data and the treatment data.
  • the change is representative of the change after treatment.
  • the computational methods and systems provided herein translate SRPs into NPA scores.
  • Experimental measurements that are identified as downstream effects of a perturbation within a network model are aggregated into a network-specific response score.
  • the network scoring engine 114 generates NPA scores for each perturbation using the networks identified at step 214 by the network modeling engine 112 and the SRPs generated at step 212 by the SRP engine 110.
  • NPA scoring applies one or more defined algorithm(s) to an experimental dataset consisting of a series of treatment versus control comparisons, where the experimental data is filtered to represent a particular scope of biology (for example, a particular set of gene expression relationships) in the context of a defined biological network model.
  • a NPA score quantifies a biological response to a treatment (represented by the SRPs) in the context of the underlying relationships between the biological entities (represented by the identified networks).
  • the network scoring engine 114 includes or comprises hardware and software components for generating NPA scores for each of the networks contained in or identified by the network modeling engine 112.
  • the network scoring engine 114 may be configured to implement any of a number of scoring techniques. Such techniques include those that generate scalar-valued scores. Such techniques also include those that generate vector- valued scores. Vector- valued scores are indicative of the magnitude and topological distribution of the response of the network to the perturbation.
  • a strength score is a scalar valued score that is a mean of the activity.
  • a strength score is a mean of the activity observations for different entities represented in the SRP. The strength of a network response is calculated in accordance with:
  • d represents the expected direction of activity change for the entity associated with node i
  • represents the log of the fold-change (i.e. the number describing how much a quantity changes going from initial to final value) of activity between the treatment and control conditions
  • N is the number of nodes with associated measured biological entities.
  • a positive strength score indicates that the SRP is matched to the expected activity change derived from the identified networks, while a negative strength score indicates that the SRP is unmatched to the expected activity change.
  • the score may be generated by a geometric perturbation index scoring technique, a probabilistic perturbation index scoring technique, or an expected perturbation index scoring technique.
  • One scoring technique is the Geometric Perturbation Index (GPI) scoring technique.
  • GPS Geometric Perturbation Index
  • FIG. 5 is a flow diagram 500 of a GPI scoring technique that may be implemented by the network scoring engine 1 14.
  • the network scoring engine assembles a fold-change vector ⁇ .
  • a fold-change is a number describing how much a measurable changes going from an initial value to a final value under different conditions, such as between the perturbation and control conditions.
  • This fold-change vector has N components, corresponding to the number of nodes in the network with associated measured biological entities.
  • the ith component of the fold-change vector, ⁇ ⁇ represents the logarithm (e.g., base 2) of the fold- change of the activity of the ith measured biological entity between the perturbation and control conditions (i.e.
  • represents the fold-change in activity between perturbation conditions without a logarithm operation; in such implementations, a value of one for ⁇ ; indicates that no change in activity was observed between the perturbation and control conditions. It will be understood that fold-changes are simply one possible approach of quantifying an activity for use with the network scoring techniques described herein, and any other convention for expressing changes in measurables may be used.
  • the step of generating the score may comprise a linear or a non-linear combination of the activity measures, the weight values, and the direction values; and a normalization of the combination by a scale factor.
  • the combination may be an arithmetic combination, and the scale factor may be the square root of the number of biological entities for which measured data are received.
  • the scores are not scalar-value scores.
  • the network scoring engine 1 14 generates a weight vector r.
  • the weight vector r also has N components, one for each of the components of the fold-change vector ⁇ .
  • Each of the components ⁇ of the weight vector r represents a weight to be given to the ith observed fold-change ⁇ ;.
  • the weight represents the known biological significance of the ith measured entity with regard to a feature or an outcome of interest (e.g., a known carcinogen in cancer studies).
  • the weight represents the confidence of the activity measurement of the biological entity associated with the node. By weighting the log-fold-changes by confidence estimates, fold-changes ⁇ ; for which confidence is low contribute less to the GPI score. Improved laboratory conditions, increased number of biological replicates, better repeatability, smaller variance, and stronger signals may all contribute to a higher confidence in a particular ⁇ ;.
  • One value that may be advantageously used for weighting is the local false non- discovery rate mdr; (i.e., the probability that a fold-change value ⁇ ; represents a departure from the underlying null hypothesis of a zero fold-change, in some cases, conditionally on the observed p-value) as described by Strimmer et al. in "A general modular framework for gene set enrichment analysis," BMC Bioinformatics 10:47, 2009 and by Strimmer in “A unified approach to false discovery rate estimation,” BMC Bioinformatics 9:303, 2008, each of which is incorporated by reference herein in its entirety.
  • findr is calculated in accordance with
  • fdr is the local false discovery rate (i.e., the probability that a fold-change value ⁇ ; does not represent a departure from the underlying null hypothesis of a zero fold-change), V; is the Benjamini-Hochberg adjustment factor described by Benjamini et al. in "Controlling the false discovery rate: a practical and powerful approach to multiple testing," Journal of the Royal Statistical Society, Series B 57:289, 1995, which is incorporated by reference herein in its entirety, p is the probability of obtaining a fold-change at least as extreme as the fold-change ⁇ ; that was actually observed (assuming that the null hypothesis of a zero fold-change is true), and tdf is a t-distribution with df degrees of freedom.
  • the network scoring engine 1 14 uses the weight vector r to scale the fold- change vector ⁇ .
  • the result is a scaled fold-change vector in which each component ⁇ ; is multiplied by its associated weight component
  • One way to achieve such a scaling computationally is to create an NxN diagonal matrix with the weight components ⁇ on the dia onal, and multiply that matrix by the Nxl vector ⁇ , as shown in Eq. 3 :
  • the network scoring engine 1 14 identifies the expected directions of change for each component in the fold-change vector ⁇ .
  • the network scoring engine 1 14 may do so by querying the network modeling engine 1 12 to retrieve the expected directions of change from the causal biological network models.
  • the network scoring engine 1 14 can then assemble these expected directions of change into an N-component vector d, where the ith component of the vector d, di, represents the expected direction of change (e.g., +1 for increased activity and -1 for decreased activity) for the ith measured biological entity.
  • the network scoring engine 1 14 combines the components of the scaled fold-change vector (generated at step 506) with the expected directions of change for each component (identified at step 508).
  • this combination is an arithmetic combination, wherein each of the scaled fold-changes ⁇ are multiplied by its corresponding expected direction of change di and the result summed over all N biological entities.
  • this implementation of step 510 can be represented by
  • the vectors d, r and ⁇ may be combined in any linear or non-linear manner.
  • the network scoring engine 1 14 normalizes the combination of step 510.
  • the normalization consists of multiplying by a pre-determined scale factor.
  • One such scale factor is the square root of N, the number of biological entities.
  • the GPI score can be represented by
  • the observed effect of perturbation on the downstream gene expressions is also a vector in this space.
  • the amplitude of the perturbation in the causal network model can be quantified by projecting the differential log 2 expression vector onto the hypothesis unit vector.
  • the downstream measurements of a causal network model come from a generic model.
  • a positive GPI score indicates an upregulation of the process described by the mechanism hypotheses
  • a zero GPI score indicates that the process is unchanged along the direction s of the mechanism hypotheses
  • a negative GPI score indicates that the process is downregulated.
  • FIG. 6 is a flow diagram 600 of a Probabilistic Perturbation Index (PPI) scoring technique that may be implemented by the network scoring engine 114.
  • PPI Probabilistic Perturbation Index
  • each SRP represents the activity (or change in activity) of a measured biological entity under a treatment condition.
  • Each SRP then, is associated with a number of measured activities, one for each measured biological entity.
  • the PPI is a quantification of the probability that the biological mechanisms represented by the networks of interest are activated given the observed SRPs.
  • the network scoring engine 1 14 assembles a fold-change vector ⁇ .
  • This fold-change vector representing the observed fold-changes in the activity of the N measured biological entities, may be assembled as described above with reference to step 502 of the Geometric Perturbation Index (GPI) scoring technique illustrated in FIG. 5.
  • the network scoring engine 1 14 generates a range for the fold-change density.
  • the range for the fold-change density represents an approximation of the set of values that the fold-change values can take in the biological system under the treatment conditions, and may be approximated by the range [-W,W], where W is the theoretical expected largest absolute value of a log2 fold- change. By choosing W this way, all observed fold-changes will fall in the range [-W,W].
  • the maximum expected signal of a gene chip e.g., 16 in log2 scale
  • W the maximum expected signal of a gene chip
  • the network scoring engine 1 14 identifies the expected directions of change for each component in the fold-change vector ⁇ . This step may be performed as described above with reference to step 508 of the GPI scoring technique illustrated in FIG. 5, resulting in a set of expected directions of change d; that correspond to the observed fold-changes ⁇ ;.
  • the network scoring engine 1 14 generates a positive activation metric.
  • a positive activation metric represents the degree to which the SRPs indicate that the observed activation/inhibition of biological entities is consistent with the expected directions of change represented by the d;. Consistent behavior is referred to as "positive activation” herein.
  • One positive activation metric that may be used is the probability that a network or networks is positively activated. Such a probability, referred to as PPI+, may be calculated in accordance with the following expression:
  • Vx(PositivelyActivated) — ⁇ Vx(PositivelyActivated ⁇ ⁇ ) ⁇
  • Vv(PositivelyActivated ⁇ ⁇ ) — fndr f where fndr; is the false non-discovery rate discussed above with reference to Eq. 1 .
  • the network scoring engine 1 14 is configured to numerically integrate the expression of Eq. 6 using a set of bins representing the values of ⁇ between 0 and W.
  • bins One set of bins that may be used are the bins [ ⁇ , ⁇ ⁇ ], where the ( ⁇ ) subscripts represent the values taken in order from smallest fold-change to largest fold-change and with the convention that
  • the network scoring engine 114 calculates an approximation to the positive activation metric PPI + according to: ppr
  • the network scoring engine 114 generates a negative activation metric.
  • a negative activation metric represents the degree to which the SRPs indicate that the observed activation/inhibition of biological entities is inconsistent with the expected directions of change represented by the d;. Inconsistent behavior is referred to as "negative activation” herein.
  • One negative activation metric that may be used is the probability that a network or networks is negative activated. Such a probability, referred to as ⁇ , may be calculated in accordance with the following expression:
  • Vx NegativelyActivated ⁇ ⁇ ) — ⁇ fnd ⁇
  • the network scoring engine 1 14 is configured to numerically integrate the expression of Eq. 9 using a set of bins representing the values of ⁇ between -W and 0.
  • One set of bins that may be used are the bins where the ( ⁇ ) subscripts represent the values taken in order from smallest fold-change to largest fold-change and with the convention that In such implementations, the network scoring engine 114 calculates an approximation to the negative activation metric ⁇ according to: ⁇ «— V fndrA R
  • the network scoring engine combines the positive activation metric (generated at step 608) and the negative activation metric (generated at step 610) to generate a composite metric, referred to as the Probabilistic Perturbation Index or PPI.
  • the combination of step 612 can be any linear or non-linear combination.
  • the PPI is a weighted linear combination of the positive activation metric and the negative activation metric.
  • the network scoring engine 1 14 may be configured to generate a PPI in accordance with:
  • the network scoring engine 1 14 may be configured to compute the PPI of Eq. 12 by calculating the LI norm of the vector whose ith component is defined according to:
  • FIG. 7 is a flow diagram 700 of an Expected Perturbation Index (EPI) scoring technique that may be implemented by the network scoring engine 1 14.
  • EPI Expected Perturbation Index
  • each SRP represents the activity (or change in activity) of a measured biological entity under a treatment condition.
  • Each SRP then, is associated with a number of measured activities, one for each measured biological entity.
  • the EPI is a quantification of the average activity change over all biological entities represented by the SRP.
  • the measured activities represented in an SRP may be random draws from a distribution of measured activities, with the EPI representing the expected value of that distribution. If each of the fold-changes ⁇ ; is drawn from a distribution ⁇ ( ⁇ ), then the expected value of that distribution is
  • the network scoring engine 1 14 may be configured to execute the steps described below to approximate the EPI value based on the observed activities and other information drawn from the system 100.
  • the network scoring engine 114 assembles a fold-change vector ⁇ .
  • This fold-change vector representing the observed fold-changes in the activity of the N measured biological entities, may be assembled as described above with reference to step 502 of the Geometric Perturbation Index (GPI) scoring technique illustrated in FIG. 5 or step 602 of the Probabilistic Perturbation Index (PPI) scoring technique illustrated in FIG. 6.
  • the network scoring engine 114 generates a range for the fold-change density. The network scoring engine 114 may generate the range for the fold-change density as described above with reference to step 604 of the PPI scoring technique illustrated in FIG. 6.
  • the network scoring engine 114 identifies the expected directions of change for each component in the fold-change vector ⁇ . This step may be performed as described above with reference to step 508 of the GPI scoring technique illustrated in FIG. 5, resulting in a set of expected directions of change d; that correspond to the observed fold-changes ⁇ ;.
  • the network scoring engine 114 generates an approximate fold-change density. If each of the fold-changes ⁇ ; drawn from a distribution ⁇ ( ⁇ ), then the distribution ⁇ ( ⁇ ) can be a roximately represented by:
  • the network scoring engine 114 generates the approximate expected value of the approximate fold-change density, resulting in an EPI score.
  • the network scoring engine 114 applies a computational interpolation technique (e.g., linear or nonlinear interpolation techniques) to generate an approximate continuous distribution from the distribution of Eq. 16, then calculates the expected value of that distribution using the formula of Eq. 15.
  • the network scoring engine 114 is configured to use the discrete distribution of Eq. 16 as a rectangular approximation to the continuous distribution, and calculate the EPI in accordance with:
  • n + is the number of entities whose activity was expected to increase in response to the treatment (per step 706) and n- is the number of entities whose activity was expected to decrease in response to the treatment (per step 706).
  • high value fold-changes are taken into account more often than lower ones, providing a measure of activity with high specificity.
  • the network scoring engine 114 may also be configured to determine confidence intervals around the network scores. These confidence intervals may be used by clinicians and researchers to evaluate the experimental results reflected in the network scores and may be used by other components of the system 100 in further data processing steps (e.g., by the aggregation engine 1 10).
  • the network scoring engine 1 14 uses a computational bootstrapping technique, such as a parametric or non-parametric bootstrapping technique, to approximate the distributions of the computed metrics.
  • a computational bootstrapping technique such as a parametric or non-parametric bootstrapping technique
  • many such bootstrapping techniques are known in the art.
  • a non-parametric technique may be advantageously employed.
  • parametric techniques may be advantageously employed.
  • the ⁇ are assumed to arise from a normal distribution under the null hypothesis, with mean zero and sample variance Si based on tdf degrees of freedom.
  • the network scoring engine may generate these quantities, as well as t-statistics and moderated t- statistics representative of the ⁇ ;, by using a statistical estimation and test procedure, such as the t-statistics and moderated t-statistics generated by the linear model approach of the "limma" R package, commonly used in the analysis of differential gene expression and described by Smyth in "Linear models and empirical Bayes methods for assessing differential expression in microarray experiments," Statistical Applications in Genetics and Molecular Biology, 3 :3, 2004, incorporated in its entirety by reference herein. For example, to determine confidence intervals for EPI scores (as discussed above with reference to FIG.
  • the network scoring engine 1 14 may be configured to implement a parametric bootstrapping technique to approximate the distribution of the ⁇ ⁇ , assuming that the ⁇ ; arise from an underlying normal distribution.
  • the network scoring engine 1 14 may additionally apply the bias-corrected percentile method described by Efron in "The jackknife, the bootstrap, and other resampling plans," SIAM, 1982 and Diciccio et al. in "A review of bootstrap confidence intervals," Journal of the Royal Statistical Society, 50:338, 1988, each of which is incorporated by reference in its entirety herein.
  • the network scoring engine 1 14 may employ an analytical approach to determine the confidence intervals, instead of or in combination with a bootstrapping technique.
  • the particular techniques implemented by the network scoring engine 1 14 to analytically determine confidence intervals will depend on the particular network scoring technique used and the assumptions on the underlying statistical distributions for the ⁇ ;.
  • the network scoring engine 1 14 when the network scoring engine 1 14 is configured to calculate strength scores (in accordance with Eq. 1), the network scoring engine 1 14 treats the strength score as a random variable consisting of a weighted sum of independent, approximately normal random variables. As a result, the distribution of the strength score is an approximately normal random variable, with zero mean and a variance that is calculated in accordance with . (18)
  • the network scoring engine 114 can use the variance Strength to derive a t-statistic in accordance with
  • the network scoring engine 114 may generate a (l-a)-confidence interval for the strength score in accordance with strength ⁇ t a 2 S strength (2())
  • the network scoring engine 114 when the network scoring engine 114 is configured to calculate GPI scores (as discussed above with reference to FIG. 5), the network scoring engine 1 14 may also be configured to calculate a confidence interval for the GPI score in accordance with the steps of the flow diagram 800 of FIG. 8. At step 802, the network scoring engine 114 performs a first-order Taylor expansion of the GPI score as represented by Eq. 5, as a function of the ⁇ ;, in accordance with
  • the network scoring engine 114 assesses whether the coefficients of the ⁇ ; terms in the GPI calculation are functions of the ⁇ ;. These coefficients include or comprise the expected direction terms d; and the weights ⁇ . When these coefficients do not depend on the values of ⁇ ;, the first-order term in Eq. 21 becomes a constant value with respect to ⁇ ; and the network scoring engine 114 proceeds to step 808. However, when the coefficients do depend on the values of ⁇ , the network scoring engine 114 proceeds to step 806 to approximate the first- order term in Eq. 21.
  • the weight vector r is a function of the ⁇ ; and the ex ected direction terms d; are not a function of the ⁇
  • the first order term may be represented as:
  • the network scoring engine 114 ma use the followin expression for the derivative term of Eq. 22:
  • the derivative labeled "terml " in Eq. 23 represents the derivative of the Benjamini- Hochberg adjustment factor and the integral labeled "term2" represents the p-value for the fold- change of the ith biological entity. Because the Benjamini-Hochberg terms are most relevant when p-values are low, the network scoring engine 114 may be configured to approximate the product of terml and term2 as zero at step 806. As a result, the network scoring engine 114 may apply the fundamental theorem of calculus and use the following approximation of the derivative term of Eq. 23:
  • the network scoring engine 1 14 determines the approximate variance of the GPI score using the approximation of the GPI score generated in the preceding steps. If the GPI score has been approximated as an affine function of the random variables ⁇ ; (as in Eq. 21), the variance of the a roximation will be the weighted sum of the variances of the ⁇ ; as given by:
  • the network scoring engine 1 14 evaluates the variance of the GPI score (e.g. , as represented by Eq. 27) at the observed fold-change values.
  • the network scorin engine 1 14 generates a confidence interval for the GPI score in accordance with
  • Eq. 28 may be adapted as desired to determine variance of a PPI score at the observed fold-change values.
  • the network scoring engine 1 14 may generate vector- valued scores in addition to or instead of the scalar-valued scores described above.
  • One vector-valued score is the vector of fold-changes or absolute changes in activity for each of the measured nodes.
  • the network scoring engine 1 14 may generate multiple NPA scores. For example, the network scoring engine 1 14 may generate an NPA score for a particular network, a particular dose of the agent, and a particular exposure time.
  • the process 200 for quantifying the response of a biological network to a perturbation by calculating a network perturbation amplitude (NPA) score has been used to analyze tumor necrosis factor (TNF)-treated normal human bronchial epithelial (NHBE) cells using several causal network models.
  • NPA network perturbation amplitude
  • NF- kB tumor necrosis factor kappa-light-chain enhancer of activated B cells
  • NFa tumor necrosis factor-alpha
  • Normal human bronchial epithelial (NHBE) cells were treated with four different doses of TNFa (0.1 , 1 , 10 and 100 ng/mL) and total RNA was collected for microarray measurement at four different times after treatment (30 minutes, 2 hours, 4 hours and 24 hours). All treatments were compared to time-matched mock-treated controls to obtain 16 contrasts (4 doses x 4 time points).
  • Normal human bronchial epithelial cells (Lonza WalkersviUe, Inc.) were cultured in standard growth medium (Clonetics medium, Lonza WalkersviUe, Inc.). Cells were either treated with TNFa (Sigma) or a vehicle control (HBSS), and then harvested after the desired perturbation time periods.
  • RNA samples were immediately put on ice and split into three technical replicates from which total RNA was extracted using RNeasy Microkit (Qiagen). The processed RNA samples are then hybridized to Affymetrix U133 Plus 2.0 microarrays. Cell viability and cell counts were controlled for all conditions after 24 hours with CellTiter-Glo® assay (Promega). NF-kB nuclear translocation was measured using Cellomics NF-kB Activation HCS Reagent Kit (Thermo Scientific). Data processing and NPA methods were implemented in the R statistical environment.
  • RNA expression data was analyzed using the affy and limma packages of the Bioconductor suite of microarray analysis tools available in the R statistical environment (Gentleman R: Bioinformatics and computational biology solutions using R and Bioconductor. New York: Springer Science+Business Media; 2005; Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5:R80).
  • RMA Robust Microarray Analysis
  • Probe sets were matched to RNA Abundance nodes in the Selventa Knowledgebase using the HG-U133_Plus_2.na30 probe set mappings and the following criteria. First, only “at” or “s_at” probe sets were considered. Second, probe sets that mapped to multiple genes were discarded. Third, when multiple probe sets mapped to the same gene, preference was given to “at” probe sets over “s_at” probe sets. Finally, when there still remained multiple probe sets mapped to the same gene, the probe set with the lowest geometric mean FDR-corrected p-value across all contrasts of interest was selected.
  • the Selventa Knowledgebase is a repository containing over 1.5 million nodes (biological concepts and entities) and over 7.5 million edges (assertions about causal and non-causal relationships between nodes).
  • the assertions in the Selventa Knowledgebase are derived from peer-reviewed scientific literature as well as other public and proprietary databases. Specifically, each assertion describes an individual experimental observation from an experiment performed in a human, mouse, and rat species context, either in vitro or in vivo. Assertions also capture information about the referring source ⁇ e.g.
  • NFkB nuclear factor kappa-light-chain-enhancer of activated B cells
  • CXCL1 Chemokine (C-X-C motif) ligand 1 [HeLa cell line; Human; PMID 16414985].
  • the knowledgebase contains causal relationships derived from healthy tissues and disease areas such as inflammation, metabolic diseases, cardiovascular injury, liver injury, and cancer.
  • the GPI, EPI and PPI scoring methods were first investigated using a causal network model created to be a specific measure of NF-kB activation, the NF-kB-direct model.
  • This model is composed of 155 genes (curated from 247 distinct references, some genes being supported by more than one reference) known to be directly regulated by NF-kB (genes whose expression is controlled in an NF-kB -dependent manner and whose promoter sequences are directly bound by NF-kB).
  • Both scoring methods showed the same pattern of response to TNFa, having demonstrated a dose-dependent response at all times, and a time-dependent response that generally saturated at later times (See FIG. 10a).
  • the EPI method was qualitatively different from GPI in that EPI scores continued to increase from 2 hours to 4 hours to 24 hours, while the GPI score plateaued from 4 hours to 24 hours. Also, the EPI method produced near-zero scores for 0.1 ng/mL TNFa. In general, EPI scores appeared to reduce to 0 (or near to 0) scores that trended relatively lower by other methods. The lowest dose for all but the 2 hour time point for the EPI method were found to not be specific to the NF-KB-direct network model.
  • NF-KB -direct model scores were compared to NF- ⁇ nuclear translocation.
  • NF- ⁇ Upon activation, NF- ⁇ is transported into the nucleus where it acts to regulate the expression of many genes. A series of feedback loops then lead to the subsequent translocation of NF- ⁇ back to the cytoplasm, and this oscillatory cycle continues several times.
  • the first oscillation may be the most reliable population-measure of NF- ⁇ activation. Although the time of the first oscillation depends on dose, 30 minutes after TNFa treatment may be a realistic time to measure NF-KB nuclear translocation for the doses used.
  • FIG. 1 1 illustrates NF-KB-direct NPA scores at 30 minutes, plotted against NF-KB nuclear translocation at 30 minutes. Error bars in NF- ⁇ nuclear translocation represent the standard deviation of the mean nuclear translocation for three different fields of view of the same population of cells. Interestingly, this dose-dependent relationship was preserved at different times after TNFa treatment ( Figure 13).
  • Figure 14 shows the results of transcriptomic data from TNFa-treated NHBE cells which was scored using GPI and EPI for (a) the NF-KB-direct model, (b) a submodel composed of 20 NF- ⁇ -regulated genes reported to be TNFa-responsive in mouse 3T3 fibroblast cells (NFKBIA, CASP4, CCL5, TNFAIP3, CCL2, ZFP36, RIPK2, TNFSF10, NFKBIE, IL6, CCL20, ICAM1 , TNFRSF1A, TNFRSF1B, SQSTM1, NRG1, SOD1, IL1RL1, HIF1A, ERBB2)(Tay et al, Single-cell NF-kappaB dynamics reveal digital activation and analogue information processing. Nature 2010, 466:267-271).
  • IKK/ NF- ⁇ signaling model which is composed of 992 genes (curated from 414 different references) that are known to be modulated by perturbation of proteins in a causal network model of signaling from the ⁇ kinase (IKK) proteins to NF-KB activation ( Figure 9); and the TNF model, which is composed of 1741 genes (curated from 589 different references) that are known to be modulated by treatment of cells with TNFa.
  • NF-KB-direct model is composed entirely of genes whose expressions were directly controlled by a single transcription factor (NF- ⁇ )
  • each of these two models contains genes whose direct transcriptional controller is not necessarily known.
  • the expression of these genes may be controlled by transcription factors not involved in constructing the model.
  • genes in the IKK/NF- ⁇ signaling model are known to be modulated by perturbation of proteins in the IKK/NF- ⁇ signaling causal network model, but some of these genes could be regulated as secondary effects caused by altered expression of a smaller subset of genes that are directly modulated by NF- ⁇ .
  • TNFa is a ligand and therefore does not directly mediate transcription of any genes. Treatment of cells with TNFa results in activation of a myriad of transcription factors, any of which may directly or indirectly (for example, through autocrine signaling) alter the expression of each gene in the TNF model.
  • FIG. 9 illustrates the full causal network model (top), along with a schematic of the basic model architecture (middle).
  • CHUK, 1KB KB, and IKBKG act as inhibitors of NFKBIA, NFKBIB, and NFKBIE, which are in turn inhibitors of NFKBl, NFKB2, and RELA.
  • the nodes used in the model are listed under each section. The nodes in bold represent nodes that have downstream gene expression measurables in the knowledgebase, and the number of measurables is given in the square brackets (because the same downstream may be found under multiple nodes, these 1227 downstream measurables correspond to 992 unique measurables).
  • CHUK P@S represents CHUK phosphorylated at serine (where the residue is given if known)
  • CHUK P@ST represents CHUK phosphorylated at serine or threonine (the exact residue is unknown)
  • kaof(CHUK) represents the kinase activity of CHUK
  • CHUK: 1KB KB represents the complex of CHUK and IKBKB proteins
  • IkappaB kinase complex Hs represents an aggregate of the various ⁇ kinases (CHUK, IKBKB, and IKBKG) in Homo sapiens (Hs)
  • degradationof(NFKBIA) represents the process of NFKBIA degradation
  • taof(NFKBl) represents the transcriptional activity of NFKBl .
  • the IKK/NF-KB signaling model and TNF model give insight into the behaviors of mechanism hypotheses at different levels of proximity to the measurements.
  • the IKK/NF- ⁇ signaling model is primarily composed of genes that are regulated (either directly or indirectly) by NF-KB ( Figure 9), and it produced a pattern of response that is very similar to the NF-KB-direct model ( Figure 10(b)). This similar pattern of response suggests that there is not a large difference between the population-level behavior of genes that are known to be directly regulated by a transcription factor and the behavior of genes where knowledge of direct regulation is unknown.
  • the E2F1 -direct model is composed of 80 genes (curated from 54 different references) known to be directly regulated by E2F1 (expression controlled by E2F1 and promoter sequence bound by E2F1).
  • E2F1 expression controlled by E2F1 and promoter sequence bound by E2F1.
  • the NPA response of the four mdoels introduced above were assessed in response to inhibition of cell cycle progression via a CDK inhibitor.
  • FIG. 15 is a block diagram of a distributed computerized system 1500 for quantifying the impact of biological perturbations.
  • the components of the system 1500 are the same as those in the system 100 of FIG. 1, but the arrangement of the system 100 is such that each component communicates through a network interface 1510.
  • Such an implementation maybe appropriate for distributed computing over multiple communication systems including wireless communication system that may share access to a common network resource, such as "cloud computing" paradigms.
  • FIG. 16 is a block diagram of a computing device, such as any of the components of system 100 of FIG. 1 or system 1300 of FIG. 13 for performing processes described with reference to figures 1 - 10.
  • Each of the components of system 100 including the SRP engine 110, the network modeling engine 112, the network scoring engine 114, the aggregation engine 1 16 and one or more of the databases including the outcomes database, the perturbations database, and the literature database may be implemented on one or more computing devices 1600.
  • a plurality of the above-components and databases may be included or comprised within one computing device 1600.
  • a component and a database may be implemented across several computing devices 1600.
  • the computing device 1600 comprises at least one communications interface unit, an input/output controller 1610, system memory, and one or more data storage devices.
  • the system memory includes or comprises at least one random access memory (RAM 1602) and at least one read-only memory (ROM 1604). All of these elements are in communication with a central processing unit (CPU 1606) to facilitate the operation of the computing device 1600.
  • the computing device 1600 may be configured in many different ways. For example, the computing device 1600 may be a conventional standalone computer or alternatively, the functions of computing device 1600 may be distributed across multiple computer systems and architectures.
  • the computing device 1600 may be configured to perform some or all of modeling, scoring and aggregating operations. In FIG. 10, the computing device 1600 is linked, via network or local network, to other servers or systems.
  • the computing device 1600 may be configured in a distributed architecture, wherein databases and processors are housed in separate units or locations. Some such units perform primary processing functions and contain at a minimum a general controller or a processor and a system memory. In such an aspect, each of these units is attached via the communications interface unit 1608 to a communications hub or port (not shown) that serves as a primary communication link with other servers, client or user computers and other related devices.
  • the communications hub or port may have minimal processing capability itself, serving primarily as a communications router.
  • a variety of communications protocols may be part of the system, including, but not limited to: Ethernet, SAP, SASTM, ATP, BLUETOOTHTM, GSM and TCP/IP.
  • the CPU 1606 comprises a processor, such as one or more conventional microprocessors and one or more supplementary co-processors such as math co-processors for offloading workload from the CPU 1606.
  • the CPU 1606 is in communication with the communications interface unit 1608 and the input/output controller 1610, through which the CPU 1606 communicates with other devices such as other servers, user terminals, or devices.
  • the communications interface unit 1608 and the input/output controller 1610 may include or comprise multiple communication channels for simultaneous communication with, for example, other processors, servers or client terminals.
  • Devices in communication with each other need not be continually transmitting to each other. On the contrary, such devices need only transmit to each other as necessary, may actually refrain from exchanging data most of the time, and may require several steps to be performed to establish a communication link between the devices.
  • the CPU 1606 is also in communication with the data storage device.
  • the data storage device may comprise an appropriate combination of magnetic, optical or semiconductor memory, and may include or comprise, for example, RAM 1602, ROM 1604, flash drive, an optical disc such as a compact disc or a hard disk or drive.
  • the CPU 1606 and the data storage device each may be, for example, located entirely within a single computer or other computing device; or connected to each other by a communication medium, such as a USB port, serial port cable, a coaxial cable, an Ethernet type cable, a telephone line, a radio frequency transceiver or other similar wireless or wired medium or combination of the foregoing.
  • the CPU 1606 may be connected to the data storage device via the communications interface unit 1608.
  • the CPU 1606 may be configured to perform one or more particular processing functions.
  • the data storage device may store, for example, (i) an operating system 1612 for the computing device 1600; (ii) one or more applications 1614 (e.g., computer program code or a computer program product) adapted to direct the CPU 1606 in accordance with the systems and methods described here, and particularly in accordance with the processes described in detail with regard to the CPU 1606; or (iii) database(s) 1616 adapted to store information that may be utilized to store information required by the program.
  • the database(s) includes or comprises a database storing experimental data, and published literature models.
  • the operating system 1612 and applications 1614 may be stored, for example, in a compressed, an uncompiled and an encrypted format, and may include or comprise computer program code.
  • the instructions of the program may be read into a main memory of the processor from a computer-readable medium other than the data storage device, such as from the ROM 1604 or from the RAM 1602. While execution of sequences of instructions in the program causes the CPU 1606 to perform the process steps described herein, hard-wired circuitry may be used in place of, or in combination with, software instructions for implementation of the processes of the present invention.
  • the systems and methods described are not limited to any specific combination of hardware and software.
  • Suitable computer program code may be provided for performing one or more functions in relation to modeling, scoring and aggregating as described herein.
  • the program also may include or comprise program elements such as an operating system 1612, a database management system and "device drivers" that allow the processor to interface with computer peripheral devices (e.g., a video display, a keyboard, a computer mouse, etc.) via the input/output controller 1610.
  • computer peripheral devices e.g., a video display, a keyboard, a computer mouse, etc.
  • Nonvolatile media include or comprise, for example, optical, magnetic, or opto-magnetic disks, or integrated circuit memory, such as flash memory.
  • Volatile media include or comprise dynamic random access memory (DRAM), which typically constitutes the main memory.
  • DRAM dynamic random access memory
  • Computer-readable media include or comprise, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM or EEPROM (electronically erasable programmable read-only memory), a FLASH- EEPROM, any other memory chip or cartridge, or any other non-transitory medium from which a computer can read.
  • a floppy disk a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM or EEPROM (electronically erasable programmable read-only memory), a FLASH- EEPROM, any other memory chip or cartridge, or any other non-transitory medium from which a computer can read.
  • Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to the CPU 1606 (or any other processor of a device described herein) for execution.
  • the instructions may initially be borne on a magnetic disk of a remote computer (not shown).
  • the remote computer can load the instructions into its dynamic memory and send the instructions over an Ethernet connection, cable line, or even telephone line using a modem.
  • a communications device local to a computing device 1600 e.g., a server
  • the system bus carries the data to main memory, from which the processor retrieves and executes the instructions.
  • the instructions received by main memory may optionally be stored in memory either before or after execution by the processor.
  • instructions may be received via a communication port as electrical, electromagnetic or optical signals, which are exemplary forms of wireless communications or data streams that carry various types of information. Further aspects and embodiments are set forth in the following passages:
  • a computerized method for quantifying the perturbation of a biological system in response to an agent comprising receiving, at a first processor, a set of treatment data corresponding to a response of a biological system to an agent, wherein the biological system includes or comprises a plurality of biological entities, each biological entity interacting with at least one other of the biological entities; receiving, at a second processor, a set of control data corresponding to the biological system not exposed to the agent; providing, at a third processor, a computational casual network model that represents the biological system and includes or comprises: nodes representing the biological entities, edges representing relationships between the biological entities, and direction values, for the nodes, representing the expected direction of change between the control data and the treatment data; calculating, with a fourth processor, activity measures, for the nodes, representing a difference between the treatment data and the control data; calculating, with a fifth processor, weight values for the nodes, wherein at least one weight value is different from at least one other weight value; and generating, with a sixth processor, a score for the computational model representative of
  • passage 1 further comprising normalizing the score based on the number of nodes in the respective computational model.
  • weight values represent a confidence in at least one of the set of treatment data and control data.
  • weight values include local false non-discovery rates.
  • the computerized method of passage 1 further comprising calculating, with a seventh processor, an approximate distribution of the activity measures over the node; calculating, with an eighth processor, an expected value of the approximate distribution; and generating, with a ninth processor, a score for each computational model representative of the perturbation of the subset of the biological system to the agent, wherein the score is based on expected value.
  • the computerized method of passage 1 further comprising calculating, with a tenth processor, a positive activation score and a negative activation score based on the activity measures, the positive and negative activation scores representative of consistency and inconsistency, respectively, between the activity measures and the direction values; and generating, with an eleventh processor, a score for each computational model representative of the perturbation of the subset of the biological system to the agent, wherein the score is based on the positive and negative activation scores.
  • the set of treatment data includes a plurality of sets of treatment data such each node includes a plurality of fold-change values defined by a first probability distribution and a plurality of weight values defined by a second probability distribution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physiology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Automation & Control Theory (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
PCT/EP2012/061035 2011-06-10 2012-06-11 Systems and methods for network-based biological activity assessment Ceased WO2012168483A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP12729448.6A EP2718880A1 (en) 2011-06-10 2012-06-11 Systems and methods for network-based biological activity assessment
JP2014514108A JP6138768B2 (ja) 2011-06-10 2012-06-11 ネットワークに基づく生物学的活性評価のためのシステムおよび方法
US14/124,826 US20140172398A1 (en) 2011-06-10 2012-06-11 Systems and methods for network-based biological assessment
CN201280028435.6A CN103827896B (zh) 2011-06-10 2012-06-11 用于基于网络的生物活动评价的系统和方法

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201161495824P 2011-06-10 2011-06-10
US61/495,824 2011-06-10
US201161525700P 2011-08-19 2011-08-19
US61/525,700 2011-08-19
EP11195417.8A EP2608122A1 (en) 2011-12-22 2011-12-22 Systems and methods for quantifying the impact of biological perturbations
EP11195417.8 2011-12-22

Publications (1)

Publication Number Publication Date
WO2012168483A1 true WO2012168483A1 (en) 2012-12-13

Family

ID=47295520

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2012/061035 Ceased WO2012168483A1 (en) 2011-06-10 2012-06-11 Systems and methods for network-based biological activity assessment
PCT/EP2012/061033 Ceased WO2012168481A1 (en) 2011-06-10 2012-06-11 Systems and methods for quantifying the impact of biological perturbations

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/061033 Ceased WO2012168481A1 (en) 2011-06-10 2012-06-11 Systems and methods for quantifying the impact of biological perturbations

Country Status (5)

Country Link
US (3) US20140172398A1 (enExample)
EP (3) EP2608122A1 (enExample)
JP (5) JP6138768B2 (enExample)
CN (4) CN103765448B (enExample)
WO (2) WO2012168483A1 (enExample)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016508269A (ja) * 2012-12-28 2016-03-17 セルベンタ インコーポレイテッド 機構的ネットワークモデルを使用した生物学的影響の定量評価
JP2016520907A (ja) * 2013-04-23 2016-07-14 フィリップ モリス プロダクツ エス アー 体系毒物学において機構的ネットワークモデルを用いるためのシステムおよび方法
US9558318B2 (en) 2011-06-10 2017-01-31 Philip Morris Products S.A. Systems and methods for quantifying the impact of biological perturbations
US10842444B2 (en) 2013-09-13 2020-11-24 Philip Morris Products S.A. Systems and methods for evaluating perturbation of xenobiotic metabolism

Families Citing this family (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8818920B2 (en) * 2012-03-09 2014-08-26 Bank Of America Corporation Incremental effect modeling by area index maximization
CN110096494B (zh) * 2012-10-22 2024-04-26 起元科技有限公司 使用源跟踪剖析数据
US8554712B1 (en) * 2012-12-17 2013-10-08 Arrapoi, Inc. Simplified method of predicting a time-dependent response of a component of a system to an input into the system
US9253044B1 (en) * 2013-01-04 2016-02-02 Selventa, Inc. Quantitative assessment of biological impact using overlap methods
EP3033721A1 (en) * 2013-08-12 2016-06-22 Philip Morris Products S.A. Systems and methods for crowd-verification of biological networks
US12020823B2 (en) * 2013-11-01 2024-06-25 H. Lee Moffitt Cancer Center And Research Institute, Inc. Integrated virtual patient framework
US9430739B2 (en) 2013-12-19 2016-08-30 Praedicat, Inc. Determining general causation from processing scientific articles
GB201405243D0 (en) 2014-03-24 2014-05-07 Synthace Ltd System and apparatus 1
SG11201610035RA (en) * 2014-06-30 2017-01-27 Evolving Machine Intelligence Pty Ltd A system and method for modelling system behaviour
US10309956B2 (en) 2014-07-10 2019-06-04 R.J. Reynolds Tobacco Company Process for assessing risk of substance administration
AR101678A1 (es) * 2014-09-11 2017-01-04 Sony Corp Dispositivo de procesamiento de información, método de procesamiento de información y medio de almacenamiento legible por computadora no transitorio de almacenamiento de programa
US20160092653A1 (en) * 2014-09-30 2016-03-31 Koninklijke Philips N.V. NUTRITIONAL INTAKE VIEWER (NutriWeb)
EP3226750A4 (en) * 2014-12-05 2018-07-04 Lifecycle Technologies Pty Ltd Method and system for improving a physiological response
US9762393B2 (en) * 2015-03-19 2017-09-12 Conduent Business Services, Llc One-to-many matching with application to efficient privacy-preserving re-identification
GB201511587D0 (en) * 2015-07-02 2015-08-19 Ge Healthcare Bio Sciences Ab A method and a system for determining a concentration range for a sample by means of a calibration curve
US10529253B2 (en) * 2016-08-30 2020-01-07 Bernard De Bono Method for organizing information and generating images of biological structures as well as related resources and the images and materials so generated
WO2018069891A2 (en) * 2016-10-13 2018-04-19 University Of Florida Research Foundation, Inc. Method and apparatus for improved determination of node influence in a network
US20190362216A1 (en) * 2017-01-27 2019-11-28 Ohuku Llc Method and System for Simulating, Predicting, Interpreting, Comparing, or Visualizing Complex Data
WO2018203349A1 (en) * 2017-05-01 2018-11-08 Parag Kulkarni A system and method for reverse hypothesis machine learning
US20200332364A1 (en) * 2017-05-12 2020-10-22 Laboratory Corporation Of America Holdings Compositions and methods for detection of diseases related to exposure to inhaled carcinogens
US10657179B2 (en) * 2017-09-01 2020-05-19 X Development Llc Bipartite graph structure
US11024403B2 (en) * 2018-01-22 2021-06-01 X Development Llc Method for analyzing and optimizing metabolic networks
US11068540B2 (en) 2018-01-25 2021-07-20 Ab Initio Technology Llc Techniques for integrating validation results in data profiling and related systems and methods
US11309058B2 (en) 2018-03-30 2022-04-19 X Development Llc Modeling the chemical composition of a biological cell wall
CN108614536B (zh) * 2018-06-11 2020-10-27 云南中烟工业有限责任公司 一种卷烟制丝工艺关键因素的复杂网络构建方法
US10961921B2 (en) 2018-09-19 2021-03-30 Pratt & Whitney Canada Corp. Model-based control system and method for a turboprop engine
US11521710B2 (en) * 2018-10-31 2022-12-06 Tempus Labs, Inc. User interface, system, and method for cohort analysis
CN109521172A (zh) * 2018-11-14 2019-03-26 苏州新派特信息科技有限公司 一种采用扁螺防控红丝虫的生物扰动效应的模拟方法
CN109712667A (zh) * 2018-12-28 2019-05-03 广东省心血管病研究所 模拟骨髓间充质干细胞移植体外模型构建中的控制方法
CN113632174B (zh) 2019-01-23 2025-03-25 密歇根大学董事会 用于nmda、甘氨酸和ampa受体的调节剂的药物基因组学决策支持
US10585990B1 (en) 2019-03-15 2020-03-10 Praedicat, Inc. Live updating visualization of causation scores based on scientific article metadata
EP3799057A1 (en) 2019-09-25 2021-03-31 Koninklijke Philips N.V. Prediction tool for patient immune response to a therapy
CN110729022B (zh) * 2019-10-24 2023-06-23 江西中烟工业有限责任公司 一种被动吸烟大鼠早期肝损伤模型建立方法及相关基因筛选方法
CN111223520B (zh) * 2019-11-20 2023-09-12 云南省烟草农业科学研究院 一种预测烟草尼古丁含量的全基因组选择模型及其应用
US10748091B1 (en) 2020-01-16 2020-08-18 Applied Underwriters, Inc. Forecasting digital reservoir controller
CN111755065B (zh) * 2020-06-15 2024-05-17 重庆邮电大学 一种基于虚拟网络映射和云并行计算的蛋白质构象预测加速方法
TWI746381B (zh) 2021-02-25 2021-11-11 長庚醫療財團法人高雄長庚紀念醫院 利用深度學習分析眼振感測資料的方法及眼振感測分析系統
US20250013803A1 (en) * 2021-10-05 2025-01-09 Vishal Gupta A system for entity based stagewise formal specification of processes and a method therefor
US20230260600A1 (en) * 2022-02-16 2023-08-17 Stokely-Van Camp, Inc. High Efficacy Functional Ingredient Blends
CN114821823B (zh) * 2022-04-12 2023-07-25 马上消费金融股份有限公司 图像处理、人脸防伪模型的训练及活体检测方法和装置
JP2023161401A (ja) * 2022-04-25 2023-11-07 国立研究開発法人農業・食品産業技術総合研究機構 形質予測方法、形質予測モデル生成方法、形質予測装置、形質予測モデル生成装置、及び形質予測システム
CN116453585B (zh) * 2023-02-23 2025-08-12 中南大学 mRNA和药物关联的预测方法、装置、终端设备及介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070225956A1 (en) 2006-03-27 2007-09-27 Dexter Roydon Pratt Causal analysis in complex biological systems
US20090099784A1 (en) 2007-09-26 2009-04-16 Ladd William M Software assisted methods for probing the biochemical basis of biological states

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5640494A (en) * 1991-03-28 1997-06-17 The University Of Sydney Neural network with training by perturbation
US6132969A (en) * 1998-06-19 2000-10-17 Rosetta Inpharmatics, Inc. Methods for testing biological network models
IL134994A0 (en) * 2000-03-09 2001-05-20 Yeda Res & Dev Coupled two way clustering analysis of data
US7623969B2 (en) * 2002-01-31 2009-11-24 The Institute For Systems Biology Gene discovery for the system assignment of gene function
US20070016390A1 (en) * 2002-03-06 2007-01-18 Bernardo Diego D Systems and methods for reverse engineering models of biological networks
WO2005055113A2 (en) * 2003-11-26 2005-06-16 Genstruct, Inc. System, method and apparatus for causal implication analysis in biological networks
US7376520B2 (en) * 2005-03-16 2008-05-20 Lam Research Corporation System and method for gas flow verification
US20080195322A1 (en) * 2007-02-12 2008-08-14 The Board Of Regents Of The University Of Texas System Quantification of the Effects of Perturbations on Biological Samples
US8518649B2 (en) * 2007-04-04 2013-08-27 {hacek over (S)}árka O. Southern Systems and methods for analyzing persistent homeostatic perturbations
US8068994B2 (en) * 2007-07-27 2011-11-29 Wayne State University Method for analyzing biological networks
US20110119259A1 (en) * 2008-04-24 2011-05-19 Trustees Of Boston University Network biology approach for identifying targets for combination therapies
US8577619B2 (en) * 2008-05-27 2013-11-05 Sloan Kettering Institute For Cancer Research Models for combinatorial perturbations of living biological systems
EP2342664A1 (en) * 2008-09-03 2011-07-13 Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. Computer implemented model of biological networks
US20100299289A1 (en) * 2009-05-20 2010-11-25 The George Washington University System and method for obtaining information about biological networks using a logic based approach
EP2514361A1 (en) * 2011-04-22 2012-10-24 Université Catholique De Louvain In vivo quantification of a variation of oxygenation in a tissue by using a magnetic resonance imaging technique
EP2608122A1 (en) 2011-12-22 2013-06-26 Philip Morris Products S.A. Systems and methods for quantifying the impact of biological perturbations
US20140214336A1 (en) 2011-09-09 2014-07-31 Philip Morris Products S.A. Systems and methods for network-based biological activity assessment
JP6397894B2 (ja) 2013-04-23 2018-09-26 フィリップ モリス プロダクツ エス アー 体系毒物学において機構的ネットワークモデルを用いるためのシステムおよび方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070225956A1 (en) 2006-03-27 2007-09-27 Dexter Roydon Pratt Causal analysis in complex biological systems
US20090099784A1 (en) 2007-09-26 2009-04-16 Ladd William M Software assisted methods for probing the biochemical basis of biological states

Non-Patent Citations (19)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Reverse Causal Reasoning Methods - White Paper", 4 February 2011 (2011-02-04), XP002681944, Retrieved from the Internet <URL:http://www.selventa.com/attachments/white_papers/reverse-causal-reasoning.pdf> [retrieved on 20120817] *
BENJAMINI ET AL.: "Controlling the false discovery rate: a practical and powerful approach to multiple testing", JOURNAL OF THE ROYAL STATISTICAL SOCIETY, vol. 57, 1995, pages 289
BERKOFSKY-FESSLER ET AL.: "Preclinical biomarkers for a cyclin-dependent kinase inhibitor translate to candidate pharmacodynamic biomarkers in phase I patients", MOL CANCER THER, vol. 8, 2009, pages 2517 - 2525
DICICCIO ET AL.: "A review of bootstrap confidence intervals", JOURNAL OF THE ROYAL STATISTICAL SOCIETY, vol. 50, 1988, pages 338
EFRON: "The jackknife, the bootstrap, and other resampling plans", SIAM, 1982
GENTLEMAN R: "Bioinformatics and computational biology solutions using R and Bioconductor", 2005, SPRINGER SCIENCE+BUSINESS MEDIA
GENTLEMAN RC; CAREY VJ; BATES DM; BOLSTAD B; DETTLING M; DUDOIT S; ELLIS B; GAUTIER L; GE Y; GENTRY J ET AL.: "Bioconductor: open software development for computational biology and bioinformatics", GENOME BIOL, vol. 5, 2004, pages R80, XP021012842, DOI: doi:10.1186/gb-2004-5-10-r80
IRIZARRY ET AL.: "Exploration, normalization, and summaries of high density oligonucleotide array probe level data", BIOSTATISTICS, vol. 4, 2003, pages 249 - 264, XP002466228, DOI: doi:10.1093/biostatistics/4.2.249
JURJEN W WESTRA ET AL: "Construction of a computable cell proliferation network focused on non-diseased lung cells", BMC SYSTEMS BIOLOGY, vol. 5, no. 1, 2 July 2011 (2011-07-02), pages 105 - 105, XP055029252, ISSN: 1752-0509, DOI: 10.1186/1752-0509-5-105 *
PATRICIA GIMENEZ: "Local Influence Analysis Based on the Perturbation Manifold in functional Measurment Error Models", 29 May 2009 (2009-05-29), XP002677352, Retrieved from the Internet <URL:http://www.matematica.uns.edu.ar/XCongresoMonteiro/actas.HTM> [retrieved on 20120607] *
SATTERTHWAITE: "An approximate distribution of estimates of variance components", BIOMETRICS, vol. 2, 1946, pages 110
SMYTH: "Linear models and empirical Bayes methods for assessing differential expression in microarray experiments", STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, vol. 3, 2004, pages 3
STRIMMER ET AL.: "A general modular framework for gene set enrichment analysis", BMC BIOINFORMATICS, vol. 10, 2009, pages 47, XP021047309, DOI: doi:10.1186/1471-2105-10-47
STRIMMER KORBINIAN: "A unified approach to false discovery rate estimation", BMC BIOINFORMATICS, BIOMED CENTRAL, LONDON, GB, vol. 9, no. 1, 9 July 2008 (2008-07-09), pages 303, XP021031888, ISSN: 1471-2105 *
STRIMMER: "A unified approach to false discovery rate estimation", BMC BIOINFORMATICS, vol. 9, 2008, pages 303, XP021031888
TAY ET AL.: "Single-cell NF-kappaB dynamics reveal digital activation and analogue information processing", NATURE, vol. 466, 2010, pages 267 - 271
WALTER K SCHLAGE ET AL: "A computable cellular stress network model for non-diseased pulmonary and cardiovascular tissue", BMC SYSTEMS BIOLOGY, vol. 5, no. 1, 19 October 2011 (2011-10-19), pages 168 - 168, XP055029253, ISSN: 1752-0509, DOI: 10.1186/1752-0509-5-168 *
WELCH: "The generalization of student's problems when several different population variances are involved", BIOMETRIKA, vol. 34, 1947, pages 28
WESTRA JW; SCHLAGE WK; FRUSHOUR BP; GEBEL S; CATLETT NL; HAN W; EDDY SF; HENGSTERMANN A; MATTHEWS AL; MATHIS C ET AL.: "Construction of a Computable Cell Proliferation Network Focused on Non-Diseased Lung Cells", BMC SYST BIOL, vol. 5, 2011, pages 105, XP021105994, DOI: doi:10.1186/1752-0509-5-105

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9558318B2 (en) 2011-06-10 2017-01-31 Philip Morris Products S.A. Systems and methods for quantifying the impact of biological perturbations
US10916350B2 (en) 2011-06-10 2021-02-09 Philip Morris Products S.A. Systems and methods for quantifying the impact of biological perturbations
JP2016508269A (ja) * 2012-12-28 2016-03-17 セルベンタ インコーポレイテッド 機構的ネットワークモデルを使用した生物学的影響の定量評価
EP2939164A4 (en) * 2012-12-28 2016-09-21 Selventa Inc QUANTITATIVE ASSESSMENT OF BIOLOGICAL IMPACT BY MECHANISTIC NETWORK MODELS
US10878312B2 (en) 2012-12-28 2020-12-29 Selventa, Inc. Quantitative assessment of biological impact by scoring directed tree graphs of causally inconsistent biological networks
JP2016520907A (ja) * 2013-04-23 2016-07-14 フィリップ モリス プロダクツ エス アー 体系毒物学において機構的ネットワークモデルを用いるためのシステムおよび方法
US10842444B2 (en) 2013-09-13 2020-11-24 Philip Morris Products S.A. Systems and methods for evaluating perturbation of xenobiotic metabolism

Also Published As

Publication number Publication date
EP2718880A1 (en) 2014-04-16
CN106940758B (zh) 2019-10-11
EP2608122A1 (en) 2013-06-26
JP6335260B2 (ja) 2018-05-30
CN103827896A (zh) 2014-05-28
CN106934253B (zh) 2020-07-17
JP2014522531A (ja) 2014-09-04
CN103765448A (zh) 2014-04-30
WO2012168481A1 (en) 2012-12-13
JP2014522530A (ja) 2014-09-04
US20140114987A1 (en) 2014-04-24
CN106940758A (zh) 2017-07-11
US10916350B2 (en) 2021-02-09
JP6138767B2 (ja) 2017-05-31
US20170235914A1 (en) 2017-08-17
CN103765448B (zh) 2017-05-17
CN106934253A (zh) 2017-07-07
CN103827896B (zh) 2017-04-26
JP2017073161A (ja) 2017-04-13
EP2718879A1 (en) 2014-04-16
JP6138768B2 (ja) 2017-05-31
JP2017073160A (ja) 2017-04-13
JP2018120617A (ja) 2018-08-02
US20140172398A1 (en) 2014-06-19
US9558318B2 (en) 2017-01-31
JP6336020B2 (ja) 2018-06-06

Similar Documents

Publication Publication Date Title
JP6335260B2 (ja) ネットワークに基づく生物学的活性評価のためのシステムおよび方法
US20210397995A1 (en) Systems and methods relating to network-based biomarker signatures
JP6407242B2 (ja) ネットワークに基づく生物学的活性評価のためのシステムおよび方法
US20140207385A1 (en) Systems and methods for characterizing topological network perturbations
HK1196688A (en) Systems and methods for network-based biological activity assessment
HK1196688B (en) Systems and methods for network-based biological activity assessment
HK1211360B (zh) 与基於网络的生物标记签名相关的系统和方法
HK1197698A (en) Systems and methods for network-based biological activity assessment
HK1197698B (en) Systems and methods for network-based biological activity assessment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12729448

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2014514108

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2012729448

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 14124826

Country of ref document: US