US20140074408A1 - Identifying biological response pathways - Google Patents

Identifying biological response pathways Download PDF

Info

Publication number
US20140074408A1
US20140074408A1 US14/077,733 US201314077733A US2014074408A1 US 20140074408 A1 US20140074408 A1 US 20140074408A1 US 201314077733 A US201314077733 A US 201314077733A US 2014074408 A1 US2014074408 A1 US 2014074408A1
Authority
US
United States
Prior art keywords
cellular response
nodes
identifying
molecules
participating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/077,733
Inventor
Ernest Fraenkel
Shao-Shan Carol Huang
David R. Karger
Laura Riva
Esti Yeger-Lotem
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Massachusetts Institute of Technology
Original Assignee
Massachusetts Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Massachusetts Institute of Technology filed Critical Massachusetts Institute of Technology
Priority to US14/077,733 priority Critical patent/US20140074408A1/en
Assigned to MASSACHUSETTS INSTITUTE OF TECHNOLOGY reassignment MASSACHUSETTS INSTITUTE OF TECHNOLOGY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HUANG, SHAO-SHAN CAROL, YEGER-LOTEM, ESTI, RIVA, LAURA, FRAENKEL, ERNEST, KARGER, DAVID R.
Publication of US20140074408A1 publication Critical patent/US20140074408A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F19/10
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass

Definitions

  • This disclosure relates to systems and methods for identification of biological response pathways and networks, including, for example, pathways for signaling events.
  • Bio signal transduction involves biochemical, biophysical, and/or biomechanical processes by which a cell converts one type of signal into another.
  • a cell In the course of such transduction, a cell typically senses and responds to an external stimulus (such as a hormone). This often initiates a sequence of biochemical reactions associated with various types of molecules present in the cell. Examples of such molecules include receptors, second messengers, enzymes, transcription factors, DNAs, and mRNAs.
  • the task of finding the relevant signaling pathways and the network of molecular interactions responsible for a particular signaling event can be a difficult one.
  • the invention features a method for updating a computer-readable data storage medium with the aid of a particular microprocessor tied to the computer-readable data storage medium, the data storage medium containing interaction data, the interaction data being representative of interactions within a cell.
  • a method includes receiving a global measurement of activity in a cell; causing the microprocessor to retrieve, from a database stored on a computer-readable medium, the interaction data, the interaction data including data representative of a subset of interactions within the cell, each of the interactions within that subset being consistent with the global measurement; causing the microprocessor to determine an aggregate cost for each of the interactions; causing the microprocessor to determine which of the interactions from the subset of interactions has a minimum aggregate cost; causing the microprocessor to provide output representative of the minimum cost interaction; and causing the output to be stored in the computer-readable data storage medium.
  • Some practices include representing the data from the database as an interactome.
  • the interactome has nodes representing molecules, and edges connecting pairs of the nodes. Each edge represents an interaction between molecules represented by the nodes.
  • node cost representing an anticipated performance of a molecule associated with the node during a signaling event
  • edge cost representing a reliability of an interaction between molecules connected by the edge.
  • causing the microprocessor to determine which of the signaling pathways from the subset of signaling pathways has a minimum aggregate cost comprises solving a PCST problem associated with the interactome to identify the pathways.
  • the invention includes a method for operating a machine for identifying a mechanism associated with a cellular response with the aid of a digital computer.
  • cellular responses include, but are by no means limited to signaling events, metabolic events, and phenotypic responses to a stimulus or stimuli.
  • Such a method includes identifying molecules participating in the cellular response; causing the computer to access a database containing information characterizing molecular interactions; and causing the computer to determine pathways connecting the identified molecules participating in the cellular response.
  • the pathways include: a plurality of nodes, each node representing a molecule, and a plurality of edges, each edge connecting a respective pair of nodes and representing an interaction between a respective pair of molecules represented by the respective pair of nodes.
  • the plurality of nodes includes a subset of nodes that represent molecules identified as participating in the cellular response.
  • the method further includes causing the computer to solve an optimization problem that includes determining a subset of the molecules and interactions having a minimum aggregate cost.
  • causing the microprocessor to determine pathways connecting the identified molecules participating in the cellular response includes: numerically processing data representing the network of potential interactions to determine a sub-network of nodes and edges representative of a response pathway between the input and the output.
  • causing the microprocessor to solve an optimization problem includes: associating each node that participates in the cellular response with a penalty value; associating each edge with a cost value; forming an objective function based on the penalty values and the cost values; and identifying the sub-network of nodes and edges that minimizes a value of the objective function.
  • causing the microprocessor to solve an optimization problem includes: identifying one subset of the originally identified nodes as an input subset containing input nodes and a separate subset of the originally identified nodes as an output subset containing output nodes; identifying a source node representing a source of flow; identifying a destination node representing a destination of flow; associating a quantity of flow with the source of flow; associating each edge with a cost value; and forming an objective function for the optimization problem based on the cost values of the edges connecting the input and output nodes and the quantity of flow traversing these edges from the source node to the destination node.
  • Additional practices include those in which identifying molecules participating in a cellular response includes identifying one or more proteins from a group of consisting of phosphorylated proteins; proteins encoded by a gene that, when deleted, causes a change in an organism's phenotype; and proteins that are present in an amount that changes during a cellular response.
  • the cellular response is a signaling event and the destination node represents a target gene of the signaling event.
  • Alternative practices include identifying the destination node according to measurements of differential gene expression associated with the signaling event.
  • pathways further include one or more intermediate nodes between the source node and the destination node
  • pathways further include one or more intermediate nodes between the nodes participating in the cellular response
  • proteins A variety of molecules participating in the cellular response can be identified. Among these are proteins, mRNAs, DNA sequences, and protein-protein complexes.
  • each edge is associated with a value that represents a degree of interaction between respective molecules represented by the pair of nodes connected by the edge.
  • the invention includes a tangible computer-readable medium having encoded thereon software for carrying out any of the foregoing methods.
  • Another aspect of the invention includes a data processing system configured to execute any of the foregoing methods.
  • Such configuration can be achieved by programming a general purpose computer, thereby transforming that computer into a new special purpose machine that is structurally different from a computer without such programming.
  • configuration can be achieved by constructing an application specific integrated circuit for carrying out the foregoing methods.
  • FIG. 1 shows a system for carrying out the method disclosed herein
  • FIGS. 2 and 3 are flowcharts showing procedures carried out by the system shown in FIG. 1 .
  • One method for identifying signaling pathways and for measuring characteristics of biological networks includes a computational approach that couples mathematical modeling with experimental data.
  • a network of biophysical processes for example, can be modeled with a set of coupled differential equations, each equation describing the reaction kinetics of the constituents (e.g., molecules) of a process.
  • the parameters used in one differential equation may depend on the dynamic characteristics (e.g., the concentration) of other substances or processes within the cell.
  • a disadvantage to the foregoing computational method is that modeling large networks of highly-crossed interactions may require extensive knowledge about the connectivity of the network and the kinetic parameters of individual interactions. These may not always be available in many systems.
  • Another computational approach uses statistical learning methods to extract relationships between molecules and interactions based on a dataset of formerly identified signaling events/networks.
  • a disadvantage of the foregoing method is that it may not be suitable in those applications in which the dataset is small and appears in high dimensions.
  • the relationships extracted by statistical learning are probabilistic in nature, and may not reflect the important mechanistic information of molecular interactions. Further, in cases where experimental observations are influenced by hidden variables, learning these variables can be difficult.
  • One embodiment of the systems and methods described herein uses a constraint optimization framework for identification of cellular signaling networks, as described in detail in Appendix A of U.S. Provisional Application 61/114,783.
  • global measurements of a cell associated with a particular signaling event are obtained and integrated into a mathematical model of molecular networks to identify one or more sequences of interactions involved in the signaling pathway relevant to this event.
  • the signaling pathway may include a cascade of molecules from cell-surface receptors, proteins, enzymes, transcription factors, genetic sequences, and possibly other molecules.
  • Global measurements of the cell may include, for example, phosphoproteomic data from mass spectrometry and transcriptional profiling by microarray.
  • the mathematical model of molecular networks can be formed, for example, using experimentally determined protein-protein and/or protein-DNA interactions from biological databases such as BioGRID and MIPS, in conjunction with the experimental evidence for each interaction.
  • One way to model a network is to use an interactome graph having a set of nodes connected by edges. Each node represents a molecule. An edge connecting a pair of nodes represents the interaction of a pair of molecules corresponding to those nodes.
  • Each node can be weighted, for example, based on an anticipated importance of this node involved in a particular event. Additionally, each edge can also be weighted, for example, based on the reliability of the interaction represented by the edge.
  • a detailed description of the formulation of a graph is provided in Appendix B of U.S. Provisional Application 61/114,783.
  • the network is partitioned into highly coherent sub-networks that are functionally relevant to the biological processes associated with this response. Also, most of the connected proteins in each sub-network form complexes of defined functions. Further, a set of intermediate nodes that are not identified in the global measurements are revealed in the reconstructed network. These intermediate nodes are associated with genes implicated in mating defects and alternation in mating gene reporter expression. This suggests that the constraints imposed by the global measurements provide valuable information to guide the selection of important players that contribute to the response.
  • the reconstructed pheromone signaling network resembles the known pathway.
  • Other yeast MAPK pathways such as the PKC pathway and the filamentous growth pathway are also identified in the network.
  • phosphorylated proteins appear highly informative in selecting interacting transcription factors. This is useful in understanding the condition-specific combinatorial control by transcription factors.
  • Appendices A-G of U.S. Provisional Application 61/114,783 provide examples of potential features and implementations for various embodiments and portions of embodiments.
  • the techniques described herein can be implemented as software tangibly embodied in an information carrier, e.g., in a machine-readable storage device for execution by, or to control the operation of data processing apparatus, e.g., a programmable processor, a computer, or multiple computers.
  • data processing apparatus e.g., a programmable processor, a computer, or multiple computers.
  • Such software can be expressed in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • FIG. 1 shows a particular computer-readable data storage medium 12 tied to a microprocessor 14 via a data communication path 16 .
  • the data storage medium stores information 18 representative of signaling pathways.
  • An input device 20 in communication with a processing element provides a way to control the microprocessor 14
  • an output device 22 in communication with the microprocessor 14 provides tangible output for inspection, or a pathway for communicating with the data storage medium 12 to which the microprocessor 14 is tied.
  • the microprocessor 14 causes transformations to various electronic components within it, including transistors, diodes and resistors. Ultimately, the microprocessor 14 causes a physically measurable transformation of matter within the data storage medium 12 to which it is tied. This transformation is physically measurable since if it were not, there would be no way to read the data once it had been written.
  • Such software can be tied to a particular computer or to multiple particular computers at one site or distributed across multiple sites and interconnected by a communication network. Accordingly, such software can be deployed at or executed by a particular computer or on multiple particular computers at one site or distributed across multiple sites and interconnected by a communication network.
  • the system receives global measurement data representing activity within a cell (step 24 ).
  • the system then retrieves data representing the various interactions within the cell (step 28 ). Some of the stored data is consistent with the global measurement, and some of it is not. In some cases, both kinds of data are retrieved, and the two kinds of data are classified after retrieval. In other cases, only the data that is consistent with the global measurement is retrieved.
  • the system determines the aggregate costs of the individual interactions (step 30 ) and identifies, or determines, which interaction has the minimum aggregate cost (step 32 ). Finally, the system provides output representing the minimum cost interaction (step 34 ) and stores data representing that interaction in a computer-readable data storage medium (step 36 ).
  • an alternative method includes first identifying those molecules that participate in a cellular response (step 40 ). Then, data characterizing interactions between molecules is retrieved (step 42 ). This data includes nodes representing the participating molecules and edges that connect these nodes. Weights associated with the edges represent the extent of interaction between molecules connected by that edge. Once this data is retrieved, one can determine which subset of molecules and interactions have a minimum aggregate cost (step 46 ). Data identifying such molecules and interactions can then be output (step 50 ) and stored on a computer-readable medium (step 52 ).
  • Functions can be distributed over a number of different components, for example, centralized on a single server.
  • a researcher may use a web-based interface to operate a program configured for identifying signaling pathways for cellular events.
  • Data representing the results of the operation may be presented to the research in a printed form or in an electronic form (e.g., displayed on a computer screen).
  • the techniques described herein can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer (e.g., interact with a user interface element, for example, by clicking a button on such a pointing device).
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Data Mining & Analysis (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method for operating a machine for identifying a mechanism associated with a cellular response includes identifying molecules participating in the cellular response, accessing a database containing information characterizing molecular interactions, determining pathways connecting the identified molecules participating in the response, and solving an optimization problem. The pathways comprise nodes, each representing a molecule, and edges, each connecting a pair of nodes and representing an interaction between a respective pair of molecules represented by the pair of nodes. The nodes includes a subset that represent molecules identified as participating in the cellular response. Solving an optimization problem comprises determining a subset of the molecules and interactions having a minimum aggregate cost, associating each participating node with a penalty value, associating each edge with a cost value, forming an objective function based on the penalty and cost values, and identifying the sub-network of nodes and edges that minimizes the objective function.

Description

    RELATED APPLICATIONS
  • This application is a divisional of U.S. application Ser. No. 12/618,915, filed Nov. 16, 2009 which claims the benefit of the priority date of U.S. Provisional Application 61/114,783, filed on Nov. 14, 2008, the contents of which are herein incorporated by reference in their entirety.
  • STATEMENT AS TO FEDERALLY SPONSORED RESEARCH
  • This invention was made with Government support under Grant Nos. P01 CA042063 and U54 CA112967, awarded by the NIH and Grant No. CCF-0635286, awarded by the NSF. The Government has certain rights in this invention.
  • FIELD OF INVENTION
  • This disclosure relates to systems and methods for identification of biological response pathways and networks, including, for example, pathways for signaling events.
  • BACKGROUND
  • Biological signal transduction involves biochemical, biophysical, and/or biomechanical processes by which a cell converts one type of signal into another. In the course of such transduction, a cell typically senses and responds to an external stimulus (such as a hormone). This often initiates a sequence of biochemical reactions associated with various types of molecules present in the cell. Examples of such molecules include receptors, second messengers, enzymes, transcription factors, DNAs, and mRNAs.
  • The task of finding the relevant signaling pathways and the network of molecular interactions responsible for a particular signaling event can be a difficult one.
  • SUMMARY
  • In one aspect, the invention features a method for updating a computer-readable data storage medium with the aid of a particular microprocessor tied to the computer-readable data storage medium, the data storage medium containing interaction data, the interaction data being representative of interactions within a cell. Such a method includes receiving a global measurement of activity in a cell; causing the microprocessor to retrieve, from a database stored on a computer-readable medium, the interaction data, the interaction data including data representative of a subset of interactions within the cell, each of the interactions within that subset being consistent with the global measurement; causing the microprocessor to determine an aggregate cost for each of the interactions; causing the microprocessor to determine which of the interactions from the subset of interactions has a minimum aggregate cost; causing the microprocessor to provide output representative of the minimum cost interaction; and causing the output to be stored in the computer-readable data storage medium.
  • Some practices include representing the data from the database as an interactome. The interactome has nodes representing molecules, and edges connecting pairs of the nodes. Each edge represents an interaction between molecules represented by the nodes. Among these practices are those in which each node is weighted by a “node cost” representing an anticipated performance of a molecule associated with the node during a signaling event, and each edge is weighted by an “edge cost” representing a reliability of an interaction between molecules connected by the edge. Also among these practices are those in which causing the microprocessor to determine which of the signaling pathways from the subset of signaling pathways has a minimum aggregate cost comprises solving a PCST problem associated with the interactome to identify the pathways.
  • In another aspect, the invention includes a method for operating a machine for identifying a mechanism associated with a cellular response with the aid of a digital computer. These cellular responses include, but are by no means limited to signaling events, metabolic events, and phenotypic responses to a stimulus or stimuli.
  • Such a method includes identifying molecules participating in the cellular response; causing the computer to access a database containing information characterizing molecular interactions; and causing the computer to determine pathways connecting the identified molecules participating in the cellular response. The pathways include: a plurality of nodes, each node representing a molecule, and a plurality of edges, each edge connecting a respective pair of nodes and representing an interaction between a respective pair of molecules represented by the respective pair of nodes. The plurality of nodes includes a subset of nodes that represent molecules identified as participating in the cellular response. The method further includes causing the computer to solve an optimization problem that includes determining a subset of the molecules and interactions having a minimum aggregate cost.
  • In at least one practice, causing the microprocessor to determine pathways connecting the identified molecules participating in the cellular response includes: numerically processing data representing the network of potential interactions to determine a sub-network of nodes and edges representative of a response pathway between the input and the output.
  • In another practice, causing the microprocessor to solve an optimization problem includes: associating each node that participates in the cellular response with a penalty value; associating each edge with a cost value; forming an objective function based on the penalty values and the cost values; and identifying the sub-network of nodes and edges that minimizes a value of the objective function.
  • In still another practice, causing the microprocessor to solve an optimization problem includes: identifying one subset of the originally identified nodes as an input subset containing input nodes and a separate subset of the originally identified nodes as an output subset containing output nodes; identifying a source node representing a source of flow; identifying a destination node representing a destination of flow; associating a quantity of flow with the source of flow; associating each edge with a cost value; and forming an objective function for the optimization problem based on the cost values of the edges connecting the input and output nodes and the quantity of flow traversing these edges from the source node to the destination node.
  • Additional practices include those in which identifying molecules participating in a cellular response includes identifying one or more proteins from a group of consisting of phosphorylated proteins; proteins encoded by a gene that, when deleted, causes a change in an organism's phenotype; and proteins that are present in an amount that changes during a cellular response.
  • In some practices, the cellular response is a signaling event and the destination node represents a target gene of the signaling event.
  • Alternative practices include identifying the destination node according to measurements of differential gene expression associated with the signaling event.
  • Also among the alternative practices are those in which the pathways further include one or more intermediate nodes between the source node and the destination node, and those in which the pathways further include one or more intermediate nodes between the nodes participating in the cellular response
  • A variety of molecules participating in the cellular response can be identified. Among these are proteins, mRNAs, DNA sequences, and protein-protein complexes.
  • In some embodiments, each edge is associated with a value that represents a degree of interaction between respective molecules represented by the pair of nodes connected by the edge.
  • In yet another aspect, the invention includes a tangible computer-readable medium having encoded thereon software for carrying out any of the foregoing methods.
  • Another aspect of the invention includes a data processing system configured to execute any of the foregoing methods. Such configuration can be achieved by programming a general purpose computer, thereby transforming that computer into a new special purpose machine that is structurally different from a computer without such programming. Or, configuration can be achieved by constructing an application specific integrated circuit for carrying out the foregoing methods.
  • Other features and advantages of the invention are apparent from the following description, from the claims, and from the attached figures in which:
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 shows a system for carrying out the method disclosed herein; and
  • FIGS. 2 and 3 are flowcharts showing procedures carried out by the system shown in FIG. 1.
  • DETAILED DESCRIPTION
  • One method for identifying signaling pathways and for measuring characteristics of biological networks includes a computational approach that couples mathematical modeling with experimental data.
  • One computational approach applies differential equations to model detailed biophysical processes for making quantity predictions. A network of biophysical processes, for example, can be modeled with a set of coupled differential equations, each equation describing the reaction kinetics of the constituents (e.g., molecules) of a process. The parameters used in one differential equation may depend on the dynamic characteristics (e.g., the concentration) of other substances or processes within the cell.
  • A disadvantage to the foregoing computational method is that modeling large networks of highly-crossed interactions may require extensive knowledge about the connectivity of the network and the kinetic parameters of individual interactions. These may not always be available in many systems.
  • Another computational approach uses statistical learning methods to extract relationships between molecules and interactions based on a dataset of formerly identified signaling events/networks.
  • A disadvantage of the foregoing method is that it may not be suitable in those applications in which the dataset is small and appears in high dimensions. In addition, the relationships extracted by statistical learning are probabilistic in nature, and may not reflect the important mechanistic information of molecular interactions. Further, in cases where experimental observations are influenced by hidden variables, learning these variables can be difficult.
  • One embodiment of the systems and methods described herein uses a constraint optimization framework for identification of cellular signaling networks, as described in detail in Appendix A of U.S. Provisional Application 61/114,783.
  • As described in Appendix A, global measurements of a cell associated with a particular signaling event (e.g., the mating response of baker's yeast Saccharomyces cerevisiae to pheromone) are obtained and integrated into a mathematical model of molecular networks to identify one or more sequences of interactions involved in the signaling pathway relevant to this event. Here, the signaling pathway may include a cascade of molecules from cell-surface receptors, proteins, enzymes, transcription factors, genetic sequences, and possibly other molecules. Global measurements of the cell may include, for example, phosphoproteomic data from mass spectrometry and transcriptional profiling by microarray.
  • The mathematical model of molecular networks can be formed, for example, using experimentally determined protein-protein and/or protein-DNA interactions from biological databases such as BioGRID and MIPS, in conjunction with the experimental evidence for each interaction.
  • One way to model a network is to use an interactome graph having a set of nodes connected by edges. Each node represents a molecule. An edge connecting a pair of nodes represents the interaction of a pair of molecules corresponding to those nodes.
  • Each node can be weighted, for example, based on an anticipated importance of this node involved in a particular event. Additionally, each edge can also be weighted, for example, based on the reliability of the interaction represented by the edge. A detailed description of the formulation of a graph is provided in Appendix B of U.S. Provisional Application 61/114,783.
  • One way to identify a sequence or sequences of interactions relevant to the signaling event uses a constrained optimization approach described in detail in Appendix C of U.S. Provisional Application 61/114,783. Briefly, using a Prize Collecting Steiner Tree (PCST) model, the global measurements of the cell associated with the signaling event are imposed as constraints of the optimization process, and the solution of this process reveals the set of interactions that best satisfy the constrains. One optimization technique suitable for use here is described by Ljubic, et al., in An Algorithmic Framework for the Exact Solution of the Prize-Collecting Steiner Tree Problem, published in Mathematical Programming, Volume 105, Numbers 2-3, February 2006, the contents of which are incorporated herein by reference.
  • One example of using the above described techniques to identify the signaling pathway of the yeast pheromone response is illustrated in detail in Appendix B of U.S. Provisional Application 61/114,783. The reconstructed network of interactions relevant to this yeast pheromone response provides many features and advantages, some of which are described in detail below.
  • At the global level, the network is partitioned into highly coherent sub-networks that are functionally relevant to the biological processes associated with this response. Also, most of the connected proteins in each sub-network form complexes of defined functions. Further, a set of intermediate nodes that are not identified in the global measurements are revealed in the reconstructed network. These intermediate nodes are associated with genes implicated in mating defects and alternation in mating gene reporter expression. This suggests that the constraints imposed by the global measurements provide valuable information to guide the selection of important players that contribute to the response.
  • At the local level, the reconstructed pheromone signaling network resembles the known pathway. Other yeast MAPK pathways such as the PKC pathway and the filamentous growth pathway are also identified in the network.
  • At the transcription level, phosphorylated proteins appear highly informative in selecting interacting transcription factors. This is useful in understanding the condition-specific combinatorial control by transcription factors.
  • Appendices A-G of U.S. Provisional Application 61/114,783 provide examples of potential features and implementations for various embodiments and portions of embodiments.
  • The techniques described herein can be implemented as software tangibly embodied in an information carrier, e.g., in a machine-readable storage device for execution by, or to control the operation of data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. Such software can be expressed in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • FIG. 1 shows a particular computer-readable data storage medium 12 tied to a microprocessor 14 via a data communication path 16. The data storage medium stores information 18 representative of signaling pathways.
  • An input device 20 in communication with a processing element provides a way to control the microprocessor 14, and an output device 22 in communication with the microprocessor 14 provides tangible output for inspection, or a pathway for communicating with the data storage medium 12 to which the microprocessor 14 is tied.
  • In operation, the microprocessor 14 causes transformations to various electronic components within it, including transistors, diodes and resistors. Ultimately, the microprocessor 14 causes a physically measurable transformation of matter within the data storage medium 12 to which it is tied. This transformation is physically measurable since if it were not, there would be no way to read the data once it had been written.
  • Such software can be tied to a particular computer or to multiple particular computers at one site or distributed across multiple sites and interconnected by a communication network. Accordingly, such software can be deployed at or executed by a particular computer or on multiple particular computers at one site or distributed across multiple sites and interconnected by a communication network.
  • Referring to FIG. 2, the system receives global measurement data representing activity within a cell (step 24). The system then retrieves data representing the various interactions within the cell (step 28). Some of the stored data is consistent with the global measurement, and some of it is not. In some cases, both kinds of data are retrieved, and the two kinds of data are classified after retrieval. In other cases, only the data that is consistent with the global measurement is retrieved.
  • The system determines the aggregate costs of the individual interactions (step 30) and identifies, or determines, which interaction has the minimum aggregate cost (step 32). Finally, the system provides output representing the minimum cost interaction (step 34) and stores data representing that interaction in a computer-readable data storage medium (step 36).
  • Referring now to FIG. 3, an alternative method includes first identifying those molecules that participate in a cellular response (step 40). Then, data characterizing interactions between molecules is retrieved (step 42). This data includes nodes representing the participating molecules and edges that connect these nodes. Weights associated with the edges represent the extent of interaction between molecules connected by that edge. Once this data is retrieved, one can determine which subset of molecules and interactions have a minimum aggregate cost (step 46). Data identifying such molecules and interactions can then be output (step 50) and stored on a computer-readable medium (step 52).
  • Functions can be distributed over a number of different components, for example, centralized on a single server. For example, a researcher may use a web-based interface to operate a program configured for identifying signaling pathways for cellular events. Data representing the results of the operation may be presented to the research in a printed form or in an electronic form (e.g., displayed on a computer screen).
  • To provide for interaction with a user, the techniques described herein can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer (e.g., interact with a user interface element, for example, by clicking a button on such a pointing device). Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • While the methods disclosed herein can be implemented on a general purpose digital computer, it is also possible to implement the methods on an application specific integrated circuit. In addition, it is possible to
  • It is to be understood that the enclosed appendices and the foregoing description are intended to illustrate and not to limit the scope of the invention.

Claims (14)

Having described the invention, and a preferred embodiment thereof, what is claimed as new and secured by Letters Patent is:
1. A method for operating a machine for identifying a mechanism associated with a cellular response with the aid of a digital computer, said method comprising identifying molecules participating in said cellular response, causing said computer to access a database containing information characterizing molecular interactions, causing said computer to determine pathways connecting said identified molecules participating in said cellular response, and causing said computer to solve an optimization problem, wherein said pathways comprise a plurality of nodes, each node representing a molecule, and a plurality of edges, each edge connecting a respective pair of nodes and representing an interaction between a respective pair of molecules represented by the respective pair of nodes, wherein said plurality of nodes includes a subset of nodes that represent molecules identified as participating in the cellular response, and wherein causing said computer to solve an optimization problem comprises determining a subset of the molecules and interactions having a minimum aggregate cost, associating each node that participates in the cellular response with a penalty value, associating each edge with a cost value, forming an objective function based on the penalty values and the cost values, and identifying the sub-network of nodes and edges that minimizes a value of the objective function.
2. The method of claim 1, wherein said pathways further include one or more intermediate nodes between said nodes participating in said cellular response
3. The method of claim 1, wherein causing said computer to determine pathways connecting said identified molecules participating in said cellular response includes numerically processing data representing said network of potential interactions to determine a sub-network of nodes and edges representative of a response pathway between said input and said output.
4. The method of claim 1, wherein identifying molecules participating in a cellular response comprises identifying phosphorylated proteins.
5. The method of claim 1, wherein identifying molecules participating in a cellular response comprises identifying proteins encoded by a gene that, when deleted, causes a change in an organism's phenotype.
6. The method of claim 1, wherein identifying molecules participating in a cellular response comprises identifying proteins that are present in an amount that changes during a cellular response.
7. The method of claim 1, wherein identifying molecules participating in said cellular response comprises identifying proteins.
8. The method of claim 1, wherein identifying molecules participating in said cellular response comprises identifying mRNAs.
9. The method of claim 1, wherein identifying molecules participating in said cellular response comprises identifying DNA sequences.
10. The method of claim 1, wherein identifying molecules participating in said cellular response comprises identifying protein-protein complexes.
11. The method of claim 1, wherein each edge is associated with a value that represents a degree of interaction between respective molecules represented by said pair of nodes connected by said edge.
12. The method of claim 1, wherein said cellular response is a signaling event.
13. The method of claim 1, wherein said cellular response is a metabolic event.
14. The method of claim 1, wherein said cellular response is a phenotypic response to a stimulus.
US14/077,733 2008-11-14 2013-11-12 Identifying biological response pathways Abandoned US20140074408A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/077,733 US20140074408A1 (en) 2008-11-14 2013-11-12 Identifying biological response pathways

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11478308P 2008-11-14 2008-11-14
US12/618,915 US8612160B2 (en) 2008-11-14 2009-11-16 Identifying biological response pathways
US14/077,733 US20140074408A1 (en) 2008-11-14 2013-11-12 Identifying biological response pathways

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/618,915 Division US8612160B2 (en) 2008-11-14 2009-11-16 Identifying biological response pathways

Publications (1)

Publication Number Publication Date
US20140074408A1 true US20140074408A1 (en) 2014-03-13

Family

ID=42785290

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/618,915 Active 2031-07-29 US8612160B2 (en) 2008-11-14 2009-11-16 Identifying biological response pathways
US14/077,733 Abandoned US20140074408A1 (en) 2008-11-14 2013-11-12 Identifying biological response pathways

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/618,915 Active 2031-07-29 US8612160B2 (en) 2008-11-14 2009-11-16 Identifying biological response pathways

Country Status (1)

Country Link
US (2) US8612160B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10446259B2 (en) 2015-08-10 2019-10-15 Massachusetts Institute Of Technology Systems, apparatus, and methods for analyzing and predicting cellular pathways
US11163888B2 (en) 2019-02-15 2021-11-02 Oracle International Corporation Detecting second-order security vulnerabilities via modelling information flow through persistent storage

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030124548A1 (en) * 2001-03-13 2003-07-03 Christos Hatzis Method for association of genomic and proteomic pathways associated with physiological or pathophysiological processes
US20040088116A1 (en) * 2002-11-04 2004-05-06 Gene Network Sciences, Inc. Methods and systems for creating and using comprehensive and data-driven simulations of biological systems for pharmacological and industrial applications
US20040204925A1 (en) * 2002-01-22 2004-10-14 Uri Alon Method for analyzing data to identify network motifs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030124548A1 (en) * 2001-03-13 2003-07-03 Christos Hatzis Method for association of genomic and proteomic pathways associated with physiological or pathophysiological processes
US20040204925A1 (en) * 2002-01-22 2004-10-14 Uri Alon Method for analyzing data to identify network motifs
US20040088116A1 (en) * 2002-11-04 2004-05-06 Gene Network Sciences, Inc. Methods and systems for creating and using comprehensive and data-driven simulations of biological systems for pharmacological and industrial applications

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Scott et al. (Molecular and Cellular Proteomics 4.5 (2005) pages 683-692) *

Also Published As

Publication number Publication date
US20100250143A1 (en) 2010-09-30
US8612160B2 (en) 2013-12-17

Similar Documents

Publication Publication Date Title
Ma et al. CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data
Otwinowski et al. Inferring the shape of global epistasis
Van Eeuwijk et al. What should students in plant breeding know about the statistical aspects of genotype× environment interactions?
Hu et al. MMP-cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs
Vyshemirsky et al. Bayesian ranking of biochemical system models
Rockman Reverse engineering the genotype–phenotype map with natural genetic variation
Woolf et al. Bayesian analysis of signaling networks governing embryonic stem cell fate decisions
Saul et al. Exploring biological network structure using exponential random graph models
Favaro et al. Bayesian non-parametric inference for species variety with a two-parameter Poisson–Dirichlet process prior
Harper et al. Prediction of biological activity for high-throughput screening using binary kernel discrimination
Pleil et al. Human biomarker interpretation: the importance of intra-class correlation coefficients (ICC) and their calculations based on mixed models, ANOVA, and variance estimates
Baruzzo et al. SPARSim single cell: a count data simulator for scRNA-seq data
US20030033127A1 (en) Automated hypothesis testing
Quintero et al. Interdependent phenotypic and biogeographic evolution driven by biotic interactions
Kirpich et al. SECIMTools: a suite of metabolomics data analysis tools
US20100005051A1 (en) System and method for inferring a network of associations
Valerio Jr et al. A structural feature-based computational approach for toxicology predictions
US20210073352A9 (en) System and method for drug target and biomarker discovery and diagnosis using a multidimensional multiscale module map
Lu et al. An investigation into the population abundance distribution of mRNAs, proteins, and metabolites in biological systems
Nardini et al. Learning equations from biological data with limited time samples
Guzman et al. Accounting for temporal change in multiple biodiversity patterns improves the inference of metacommunity processes
Duruflé et al. A powerful framework for an integrative study with heterogeneous omics data: from univariate statistics to multi-block analysis
Manna et al. Disentangling the role of seed bank and dispersal in plant metapopulation dynamics using patch occupancy surveys
Ramos et al. Leveraging user-friendly network approaches to extract knowledge from high-throughput omics datasets
Hibbins et al. Phylogenomic comparative methods: accurate evolutionary inferences in the presence of gene tree discordance

Legal Events

Date Code Title Description
AS Assignment

Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSET

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FRAENKEL, ERNEST;HUANG, SHAO-SHAN CAROL;KARGER, DAVID R.;AND OTHERS;SIGNING DATES FROM 20090622 TO 20100202;REEL/FRAME:031584/0883

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION