WO2023231202A9 - 数字细胞模型的构建方法及装置、介质、设备、系统 - Google Patents

数字细胞模型的构建方法及装置、介质、设备、系统 Download PDF

Info

Publication number
WO2023231202A9
WO2023231202A9 PCT/CN2022/115811 CN2022115811W WO2023231202A9 WO 2023231202 A9 WO2023231202 A9 WO 2023231202A9 CN 2022115811 W CN2022115811 W CN 2022115811W WO 2023231202 A9 WO2023231202 A9 WO 2023231202A9
Authority
WO
WIPO (PCT)
Prior art keywords
biochemical
cell model
digital cell
information
initial
Prior art date
Application number
PCT/CN2022/115811
Other languages
English (en)
French (fr)
Other versions
WO2023231202A1 (zh
Inventor
姜树嘉
李国亮
李林峰
胡健
闫峻
Original Assignee
医渡云(北京)技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210616258.9A external-priority patent/CN117198381A/zh
Priority claimed from CN202210613650.8A external-priority patent/CN117198380A/zh
Application filed by 医渡云(北京)技术有限公司 filed Critical 医渡云(北京)技术有限公司
Publication of WO2023231202A1 publication Critical patent/WO2023231202A1/zh
Publication of WO2023231202A9 publication Critical patent/WO2023231202A9/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Definitions

  • Embodiments of the present disclosure relate to the technical field of digital cell models, and specifically, to a method for constructing a digital cell model and its device, storage media, electronic equipment, and digital cell systems.
  • the purpose of this disclosure is to overcome the above-mentioned shortcomings of the prior art and provide a method for constructing a digital cell model and its device, storage medium, electronic equipment, and digital cell system.
  • a method for constructing a digital cell model including:
  • the digital cell model includes a biochemical component pool and multiple signaling pathway units;
  • the biochemical component pool includes multiple biochemical component information, and the biochemical component information includes biochemical component information.
  • the signal pathway unit is used to simulate the signal pathway of biological cells;
  • the signal pathway unit includes at least one biochemical reaction module, and the biochemical reaction module is used to simulate all biological processes using a set of biochemical process equations. Describe the biochemical processes occurring in signaling pathway units;
  • a target digital cell model is determined based on the current initial digital cell model.
  • a device for constructing a digital cell model including:
  • a building module configured to construct an initial digital cell model based on biochemical information
  • the digital cell model includes a biochemical component pool and a plurality of signaling pathway units
  • the biochemical component pool includes a plurality of biochemical component information
  • Biochemical component information includes concentration and/or location information of biochemical components
  • the signal pathway unit is used to simulate the signal pathway of biological cells
  • the signal pathway unit includes at least one biochemical reaction module, and the biochemical reaction module is used to utilize The biochemical process equation set simulates the biochemical process occurring in the signaling pathway unit;
  • a simulation module configured to iteratively simulate the initial digital cell model to simulate the biochemical processes occurring in biological cells
  • the judgment module is configured to determine a target digital cell model based on the current initial digital cell model after the digital cell model reaches a steady state.
  • a computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the above-mentioned method for constructing a digital cell model is implemented.
  • an electronic device including:
  • the processor is configured to execute the above-mentioned digital cell model construction method by executing the executable instructions.
  • a digital cell system including the above-mentioned digital cell model construction device, and further comprising:
  • Digital Cell Engine used to run digital cell models
  • Data analysis engine used to construct a multi-omics database of mutant cells based on the multi-omics data of mutant cells
  • the drug efficacy analysis engine is used to predict drug efficacy based on the running results of the digital cell engine.
  • Figure 1 schematically shows a flow chart of a method for constructing a digital cell model according to an exemplary embodiment of the present disclosure.
  • Figure 2 schematically illustrates an operational logic architecture diagram of a digital cell model according to an example embodiment of the present disclosure.
  • FIG. 3 schematically shows a flow chart of a method for constructing a digital cell model according to an exemplary embodiment of the present disclosure.
  • Figure 4 schematically shows a flow chart of a method for constructing a cell phenotype index according to an exemplary embodiment of the present disclosure.
  • Figure 5 schematically illustrates a biochemical process definition information according to an example embodiment of the present disclosure.
  • FIG. 6 schematically shows a schematic diagram of obtaining biochemical component information according to a machine learning algorithm according to an example embodiment of the present disclosure.
  • Figure 7 schematically illustrates a schematic diagram of obtaining a mathematical model of a biochemical process according to a machine learning algorithm according to an example embodiment of the present disclosure.
  • FIG. 8 schematically shows a structural diagram of a device for constructing a digital cell model according to an exemplary embodiment of the present disclosure.
  • Figure 9 schematically shows a structural diagram of a digital cell system according to an example embodiment of the present disclosure.
  • FIG. 10 schematically shows a structural diagram of an electronic device according to an example embodiment of the present disclosure.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments.
  • the described features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
  • numerous specific details are provided to provide a thorough understanding of embodiments of the disclosure.
  • those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details described, or other methods, components, devices, steps, etc. may be adopted.
  • well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the disclosure.
  • the embodiment of the present disclosure provides a method for constructing a digital cell model. See Figure 1.
  • the method for constructing a digital cell model may at least include the following steps:
  • Step S110 construct an initial digital cell model based on biochemical information;
  • the digital cell model includes a biochemical component pool and multiple signal pathway units;
  • the biochemical component pool includes multiple biochemical component information, and the biochemical component
  • the information includes concentration and/or location information of biochemical components;
  • the signal pathway unit is used to simulate the signal pathway of biological cells;
  • the signal pathway unit includes at least one biochemical reaction module, and the biochemical reaction module is used to utilize biochemical process equations
  • the group simulates the biochemical processes occurring in the signaling pathway unit;
  • Step S120 perform iterative simulation on the initial digital cell model to simulate the biochemical processes occurring in biological cells
  • Step S130 After the digital cell model reaches a steady state, determine a target digital cell model based on the current initial digital cell model.
  • biochemical component information refers to the information of each biochemical component participating in the biochemical process. These biochemical components can serve as reaction substrates, reaction products, catalysts, carriers, or participate in biochemical processes in other roles during the biochemical process. process. From a chemical perspective, the types of biochemical component information can include but are not limited to monomeric proteins, multimeric proteins, variant proteins, glycoproteins, polypeptides, amino acids, DNA, RNA, nucleotides, polysaccharides and other types that participate in biochemical processes. components. In embodiments of the present disclosure, the biochemical component information at least includes the concentration of the biochemical component. In some embodiments, the biochemical component information may also include location information of the biochemical component.
  • different biochemical components can be assigned different IDs or names, such as different numbers, and the IDs or names assigned to the biochemical components are used to distinguish different biochemical components.
  • the ID or name of the biochemical component can also be added to the biochemical component information.
  • the same biochemical component can play different roles in different biochemical processes.
  • a protein in an enzymatic reaction process (an exemplary biochemical reaction process) can serve as an enzyme in the enzymatic reaction process; in a protein synthesis process (another exemplary biochemical reaction process), the above Proteins, which are enzymes, may be synthesized in this process and thus serve as reaction products in the protein synthesis process.
  • the same substance (such as a substance with a chemical structure) can be separated due to its location or some differences (such as differences in charge characteristics, differences in dissociation levels, differences in spatial characteristics, etc.)
  • two different biochemical components for example, two different numbers are assigned respectively.
  • at least two biochemical components are the same chemical substance and are located at different locations within the cell.
  • the synthesized protein in a protein synthesis process, can be used as a biochemical component; in a protein targeted delivery process, the synthesized protein can be transported to a specific location to perform its function; in the In one example, the protein that is synthesized (that is, the protein before it is targeted for delivery) can be regarded as one biochemical component, and the protein that is targeted for delivery to a specific location can be regarded as another biochemical component.
  • the protein that is synthesized that is, the protein before it is targeted for delivery
  • the protein that is targeted for delivery to a specific location can be regarded as another biochemical component.
  • the concentration of each biochemical component is recorded in the biochemical component pool; the concentration of the biochemical component can be a mass concentration (such as g/mL, ⁇ g/mL, etc.), a molar concentration (such as ⁇ mol/L , mol/L, etc.) or use other units to characterize concentration.
  • the concentration of the biochemical component may be expressed in ⁇ mol/L.
  • a biochemical process refers to a change process of a biochemical component, which may be a change process of the component (chemical substance) itself (for example, changing from one chemical substance to another chemical substance) , can also refer to the process of changing the spatial distribution of components (such as the transport process of chemical substances within cells), or other processes that lead to changes in the type, distribution, function, etc. of chemical substances.
  • the biochemical process at least includes a biochemical reaction process and a biochemical transport process.
  • Biochemical processes result in changes in the concentrations of at least some of the biochemical components involved.
  • biochemical components as reaction substrates can be consumed, and biochemical components as reaction products can be generated; when only considering the biochemical reaction process, the biochemical process will cause the reaction to The concentration of the substrate decreases causing the concentration of the reaction products to increase.
  • the biochemical components before transport can be consumed during the biochemical transport process, and the biochemical components after transport can be generated during the biochemical transport process; when only considering the biochemical transport process, This biochemical process will cause the concentration of biochemical components before transport to decrease and the concentration of biochemical components after transport to increase.
  • a biochemical reaction module for simulating the biochemical process can be constructed according to the biochemical process, and the biochemical reaction module can include a set of biochemical process equations.
  • the biochemical reaction module is used to simulate the biochemical process occurring in the signaling pathway unit using a set of biochemical process equations.
  • a set of biochemical process equations can be used to simulate changes in the concentration of a biochemical component after a time step caused by a biochemical process.
  • the same biochemical process equation set can include multiple concentration change functions. Each concentration change function is used to describe the concentration change of a biochemical component in the biochemical process. The concentration change of each biochemical component in the biochemical process is represented by a represented by the concentration change function.
  • the concentration change function of each biochemical component in the biochemical process equation set and the relationship between the concentration change functions can reflect the role of each biochemical component in the biochemical process (such as reaction substrate, reaction product, enzyme etc.), mass conservation in biochemical processes, signal dependence in biochemical processes, etc.
  • the biochemical process equation set to simulate the biochemical process at least includes:
  • c(cat) represents the concentration of the enzyme in the biochemical process; the subsequent solution is only performed when the concentration of the enzyme is not zero, which reflects the dependence of the biochemical process on the enzyme; Before and after the biochemical process, the concentration change of c(cat) is 0, which means that the enzyme plays a catalytic role in the biochemical process without itself being consumed.
  • ⁇ c(Ant) represents the concentration change of the reaction substrate concentration after a time step t during the biochemical process.
  • ⁇ c (Product) represents the concentration change of the concentration of the reaction product in the biochemical process after a time step t
  • v (Product) represents the change rate of the concentration of the reaction product in the biochemical process
  • t represents a time step.
  • each signaling pathway is related to form an intracellular signaling network.
  • each required signal pathway can be obtained by searching medical literature, biological literature, performing link prediction, or other methods; and then constructing a signal pathway unit for describing the signal pathway.
  • the signaling pathway includes one or more biochemical processes.
  • a set of biochemical process equations that describe a biochemical process can be used to construct a signaling pathway unit that describes a signaling pathway.
  • each biochemical process in the signaling pathway described by the signaling pathway unit and the biochemical components involved in each biochemical process can be obtained.
  • the signaling pathway unit can be divided into at least a signal transduction module, a protein transport module, a cell cycle module, a programmed cell death module and a gene expression module.
  • the signaling pathway unit is used to simulate at least one of intracellular signaling, protein transport, cell cycle, and programmed cell death, or to simulate intracellular gene expression. It can be understood that the signaling pathway unit of the embodiment of the present disclosure can also be used to simulate signaling processes or biological processes in other biological cells.
  • the biochemical reaction module in the signaling pathway unit can use a different mathematical paradigm than simulating processes such as signal transduction, programmed cell death, and cell cycle.
  • a visualization engine can be constructed.
  • the visualization engine analyzes each biochemical reaction module in each signal pathway unit, thereby obtaining the relationship between each biochemical process, and obtaining each biochemical process node as a process node. Relationships between processes and biochemical components as component nodes. By outputting or displaying these relationships, the intracellular signaling network simulated by the digital cell model can be output or displayed.
  • the relationship between the process nodes and the relationship between the process nodes and the component nodes in the cells simulated by the digital cell model can be presented in the form of a graph according to the visualization engine, especially can be presented as a cell signaling network diagram.
  • the visualization engine can also dynamically present the changing status of at least some of the biochemical components.
  • a component node i.e., a biochemical component
  • a component node can be displayed in different colors when the concentration increases or decreases; for example, if the concentration of a component node increases, the node in the cell signaling network diagram will be displayed in different colors. Component nodes appear in red; if the concentration of a component node decreases, the component node appears in green in the cell signaling network diagram.
  • the component node can be displayed as a different volume according to the increase or decrease in concentration of the component node (ie, a biochemical component).
  • the greater the change in the concentration of a component node the larger the volume of the point presented by the component node; the smaller the change in concentration of the component node, the smaller the volume of the point presented by the component node.
  • concentration of the component node decreases slightly during the iteration.
  • the visualization engine can also present the changes in the concentration of single or multiple biochemical components during the iterative simulation process, such as displaying the real-time concentration of the selected biochemical component, or drawing the concentration change curve of the selected biochemical component, etc. .
  • part of the signal pathways simulated by the digital cell model are parallel, that is, there is no obvious sequence or time dependence between the two.
  • these parallel signal paths can be treated as a signal path group.
  • the signaling pathway units describing these parallel signaling pathways are regarded as parallel signaling pathway units in the digital cell model and are labeled with the same unit process sequence.
  • these signal path units with the same unit process sequence can be regarded as a signal path unit group, and signal path unit groups with different unit process sequences can be cascaded in sequence.
  • each signaling pathway unit can be marked with a process sequence, and a set of biochemical process equations with the same biochemical process sequence can form a signaling pathway unit group.
  • the digital cell model can logically include multiple cascaded signal pathway unit groups, namely signal pathway unit group A, signal pathway unit group B... signal pathway unit Group M.
  • the signal path unit group M is used to represent the last stage signal path unit group.
  • each signal pathway unit group is solved in sequence to simulate the overall orderliness and directionality of signal transmission in the biochemical processes of biological cells.
  • the signal path unit group A in this example has one signal path unit, including, for example, a signal path unit a1, a signal path unit a2, and a signal path unit a3.
  • this exemplary signal path unit group B has one signal path unit, that is, signal path unit b1.
  • the signal path units may be out of order.
  • the order in which individual signal path elements are solved can be randomly determined. In this way, the parallel nature of biochemical processes in biological cells can be simulated.
  • each signal path unit group ie, each signal path unit
  • each signal path unit is executed once as a simulation of the digital cell model, and the results after each simulation are used as the basis for the next simulation.
  • each signal path unit can be solved sequentially according to the process sequence.
  • the digital cell model includes a large number of biochemical process equations, these biochemical process equations are not solved at the same time, but are solved sequentially according to the process order of the signal pathway unit.
  • each signal pathway unit is executed in unit process order.
  • the biochemical reaction modules in the signaling pathway unit have a modular process sequence. When any one of the signaling pathway units is executed, each biochemical reaction module in the signaling pathway unit is executed according to the module process sequence.
  • the biochemical process equation set in the biochemical reaction module can be solved.
  • the biochemical process equation set references the current concentration of the biochemical component in the biochemical component pool; after completing any of the biochemical reaction modules, the biochemical process equation set refers to the current concentration of the biochemical component in the biochemical component pool.
  • the information of each biochemical component in the biochemical component pool is updated according to the solution results, especially the concentration of each biochemical component is updated.
  • each signal path unit when solving signal path units, if multiple signal path units have the same unit process order, these signal path units may be solved in parallel as a whole. Specifically, in one example, in each simulation, the execution order among the signal path units with the same unit process order is random; in a large number of iterations, each signal path unit can be regarded as parallel of. This can simulate the parallel characteristics between signal pathway units in the same signal pathway unit group and improve the robustness of the digital cell model.
  • At least one signaling pathway unit includes a plurality of signaling pathway subunits, and any one of the signaling pathway subunits includes a biochemical reaction module or a plurality of biochemical reaction modules with a process sequence.
  • the solution order of the multiple signal path sub-units can be randomly determined, and the multiple signal path sub-units can be solved sequentially according to the solution order.
  • each signal pathway sub-unit can simulate a part of the signal pathway, and this part of the signal pathway can include a biochemical process, or It includes multiple biochemical processes cascaded in sequence, and each biochemical process is simulated through a set of biochemical process equations.
  • each signal path subunit may also have a lower-level submodule.
  • each biochemical process shares the biochemical component pool, that is, the biochemical process equation set corresponding to each biochemical process needs to be solved based on the biochemical component pool and the solution results need to be reflected in the biochemical component pool.
  • concentration of each biochemical component in the biochemical component pool needs to be used; then, the biochemical groups involved in the biochemical process equation set in the biochemical component pool need to be updated based on the calculation results. concentration; this can simulate the consumption, increase, and dependence of biochemical processes on surrounding biochemical components.
  • the solution results and the biochemical component pool before updating can be used to determine the update The final pool of biochemical components.
  • the biochemical component pool is updated according to the solution results of each biochemical process equation set to simulate the dynamic change process of the biochemical components of the digital cell model during cell activities.
  • the data of the biochemical component pool at a specific stage of each simulation can be recorded, for example, the data of the biochemical component pool after the end of each simulation (ie, the concentration of each biochemical component).
  • these data can be used as historical data of the biochemical component pool for evaluation of the iterative simulation process of the digital cell model.
  • the historical data of the biochemical component pool also includes the number of times each data was generated during the simulation process.
  • the historical data of the biochemical component pool may include the simulation times of the digital cell model and the biochemical component pool data corresponding to the simulation times.
  • the digital cell model can be evaluated to determine whether the digital cell model has reached a steady state during the iterative simulation process.
  • the status of the digital cell model can be evaluated based on the process evaluation model library to determine whether the digital cell model has reached a steady state or a failure state, and then whether the iterative simulation process of the digital cell model has reached the end condition.
  • the initial digital cell model can be updated and the iterative simulation and evaluation performed again; this cycle continues until the digital cell model is evaluated during the iterative simulation process is a steady state.
  • the digital cell model was evaluated as a steady state during the iterative simulation process, indicating that the current initial digital cell model has been able to reflect some cell activity patterns of biological cells during the iterative simulation. This shows that the current initial digital cell model has The limited initial biochemical component pool and each signaling pathway unit group can effectively simulate the biochemical component distribution and cell activity process rules of biological cells to a certain extent.
  • the target digital cell model determined based on the current initial digital cell model can more effectively simulate at least part of the cellular activities of biological cells.
  • the construction method of the digital cell model of the present disclosure it is possible to judge whether the biochemical component pool and each signaling pathway unit of the initial digital cell model are suitable based on the evaluation results of the initial digital cell model during the iterative simulation process; when When the initial digital cell model is evaluated as a failure during the iterative simulation process, it can be judged that the current initial digital cell model is inappropriate for the types and concentrations of biochemical components, the simulation of various biochemical processes, etc., and then it can be The initial digital cell model is reacquired by changing at least one of the biochemical component pool and the signal pathway unit, that is, the initial digital cell model is updated by updating at least one of the biochemical component pool and the signal pathway unit.
  • the digital cell model construction method of the present disclosure can overcome the current lack of accurate cell biology knowledge and determine more appropriate biochemical groups through the cyclic process of updating the initial digital cell model - iterative simulation and process evaluation. Separate cells and signal pathway units to obtain a digital cell model that can achieve a steady state to achieve effective simulation of at least part of the cell activities.
  • the biochemical information based on it includes signaling pathway information, protein network information, gene network information, biomarker information, information related to biochemical processes, etc. at least one of.
  • digital cell models can also be constructed based on more biochemical information. Further, at least part of this biochemical information is obtained through knowledge extraction from the public literature.
  • the initial digital cell model is updated when the digital cell model reaches a failure state.
  • the initial biochemical component pool and at least one of the multiple signal pathway units can be updated, so that the biochemical component pool and multiple signal pathway units of the new initial digital cell model are in The overall difference from the initial digital cell model before the update shall prevail.
  • the initial digital cell model can be updated by updating the initial biochemical component pool (the biochemical component pool of the initial digital cell model before any iterative simulation is performed), For example, you can add one or more new biochemical component information to the initial biochemical component pool before the update, or delete one or more biochemical component information from the initial biochemical component pool before the update, or change the original biochemical component information.
  • a new initial biochemical component pool can also be generated (that is, the initial biochemical component pool is regenerated), and the new initial biochemical component pool is determined to be the same as the one before the update. After the initial biochemical component pool is different, use the new initial biochemical component pool to replace the initial biochemical component pool before updating.
  • the initial digital cell model can be updated by updating one or more signaling pathway units. For example, you can adjust the signal path unit to which the signal path unit belongs, add a new signal path unit to the current signal path unit, delete the signal path unit from the current signal path unit, and update the current signal path unit. At least one of the following strategies: the type or number of biochemical process equation sets in the pathway unit, updating the values of constants of any biochemical process equation set in the current signaling pathway unit, and other strategies.
  • a new signal path unit can also be generated (that is, the signal path unit is regenerated), and after it is determined that the new signal path unit is different from the signal path unit before the update, the new signal path unit is used.
  • the signal path unit replaces the signal path unit before the update.
  • each time the initial digital cell model is updated the strategy of updating the initial pool of biochemical components can be adopted alone, or the strategy of updating one or more signaling pathway units can be adopted alone, or the updating strategy can be adopted at the same time.
  • Strategies for initial pooling of biochemical components and strategies for updating one or more signaling pathway units are used to reacquire the initial digital cell model any two times.
  • the strategy of changing the concentration of biochemical components can be used alone to update the initial digital cell model; in the subsequent update of the initial digital cell model, the strategy of adding There are four strategies to update the initial digital cell model: new biochemical component information, deletion of biochemical component information, addition of new biochemical process equation set, and deletion of biochemical process equation set.
  • biochemical component information is added to the biochemical component pool, it is often necessary to add a set of biochemical process equations related to the newly added biochemical component information, which will lead to the adjustment of at least one signaling pathway unit.
  • biochemical process equation set related to the deleted biochemical component information, which will lead to the adjustment of at least one signaling pathway unit.
  • the strategy used each time the initial digital cell model is updated can be determined according to preset rules, such as using the same strategy multiple times in a row, or looping according to a preset strategy sequence.
  • the sequence records the strategy used each time the initial digital cell model is updated within a strategy cycle.
  • the method used to update the initial digital cell model can be determined based on the reasons why the digital cell model is evaluated as a failure state during the iterative simulation process or the specific status of the digital cell model when it is evaluated as a failure state.
  • an expert system can be introduced to improve the accuracy of updating the initial digital cell model, or technical experts can intervene to provide a more appropriate update strategy.
  • other methods can also be used to determine the strategy used each time the initial digital cell model is updated, so as to enable the initial digital cell model to be updated.
  • a parameter adjustment module and a structure adjustment module can be constructed, and the current initial digital cell model can be adjusted with the help of the parameter adjustment module and the structure adjustment module to form a new initial digital cell model.
  • the biochemical component concentration in the biochemical component information and the values of the constants of the biochemical process equations can be adjusted through the parameter adjustment module; the biochemical component information in the biochemical component pool can be increased or decreased through the structure adjustment module. , adjust the biochemical process equation set included in the signal pathway unit or adjust the signal pathway unit included in the signal pathway unit.
  • the parameter adjustment module will not cause changes in the types and quantities of biochemical components in the digital cell model, nor will it cause changes in the types, quantities, and positions of the biochemical process equations in each signaling pathway unit. It only changes the Adjustment of concentrations of biochemical components or parameters of biochemical process equations.
  • This adjustment method can generally have a smaller adjustment range and better adjustment accuracy, which is conducive to fine-tuning the digital cell model.
  • the structure adjustment module will adjust the structural characteristics of the digital cell model such as the type and quantity of biochemical components in the biochemical component pool, the type, quantity and location of the biochemical process equations in the signaling pathway unit. Through this adjustment method, the architecture of the digital cell model can be changed to a large extent, which is conducive to obtaining the possible feasible architecture of the digital cell model through rough adjustment.
  • a structure adjustment module may be first used to update the initial digital cell model until a feasible architecture of a digital cell model with good prospects is obtained. Then, the initial digital cell model after structural adjustment is updated through the parameter adjustment module to obtain a digital cell model that can simulate some functions and processes of biochemical cells under this feasible architecture.
  • the structure adjustment module and the structure adjustment module can also be used alternately to update the initial digital cell model.
  • the current initial digital cell model is updated based on at least one of the following information:
  • Information derived from medical literature information derived from biological literature, information derived from high-throughput cell experiments, information derived from high-throughput sequencing, information predicted based on literature information, experimental information or sequencing information.
  • key parameters required for constructing a digital cell model can be determined through searching and analyzing published documents. At least some of these key parameters, such as parameters that are easily verified or obtained through high-throughput cell experiments or high-throughput sequencing technology, or parameters that have large differences in different literatures, should be verified through experiments or through The experiments are obtained by themselves to ensure the accuracy and effectiveness of these key parameters, thereby improving the acquisition efficiency of the digital cell model and improving the closeness of the digital cell model to biological cell simulation.
  • the method for constructing a digital cell model may further include: obtaining the biochemical component pool of the initial digital cell model (i.e., the initial biochemical component pool) according to the biochemical component database. ); the biochemical component database includes a plurality of biochemical component setting information, and the biochemical component setting information includes the concentration range of the biochemical component.
  • multiple biochemical component setting information can be selected from the biochemical component database, and the biochemical component information is determined based on the selected biochemical component setting information, and each biochemical component information forms an initial biochemical group. Divide the pool.
  • the biochemical component setting information includes a concentration range of the biochemical component
  • the biochemical component information includes the concentration of the same biochemical component
  • the concentration in the biochemical component information is within the concentration range of the biochemical component setting information.
  • the types of biochemical components involved in the initial biochemical component pool do not exceed the types of biochemical components in the biochemical component database
  • the concentration of biochemical components in the initial biochemical component pool is within the range of the biochemical components in the biochemical component database. within the concentration range of the component.
  • the biochemical component database can be used as the basis and restriction for generating the initial biochemical component pool; of course, it can also be used as the basis and restriction for updating the initial biochemical component pool.
  • the biochemical component database includes N1 (N1 is a positive integer) biochemical component setting information, that is, it involves N1 biochemical components.
  • N1 is a positive integer
  • biochemical component setting information can be extracted from the biochemical component database and based on the N2 biochemical components
  • the corresponding N2 biochemical component information is generated based on the setting information, and the concentration of the biochemical component of each biochemical component information is within the concentration range of the biochemical component of the corresponding biochemical component setting information; the N2 biochemical component information
  • An initial pool of biochemical components can be formed.
  • the biochemical component database may also be acquired first.
  • the existing biochemical component database can be directly obtained, the required biochemical component database can also be obtained by modifying and supplementing the existing database, or the biochemical component database can be constructed from scratch .
  • a biochemical component database can be constructed from scratch.
  • the data used to construct the biochemical component database can be at least partially derived from existing databases and published documents in the biological field (such as the field of cell biology). (such as journals, newspapers, conference proceedings, monographs, etc.), obtained through biological experiments and research, and in some cases, targeted cell biology research (such as high-throughput cell experiments) has been obtained.
  • Data which may include components in biological cells and the concentrations of these components. Of course, it is understandable that there may be differences in the components and their concentrations in biological cells based on data from different sources.
  • biochemical component database at least some of the biochemical components and their related data can be found from existing databases and/or published documents in the biological field, and these data can be corrected or not corrected to form a biochemical composition. Set information and add it to the biochemical component database.
  • the concentration of a specific biochemical component if multiple data sources provide the concentration of a specific biochemical component and the concentrations of biochemical components from different data sources are relatively close, for example, the biochemical components from each data source If the concentrations of the components all fluctuate within the same order of magnitude, it can be considered that the concentration of the specific biochemical component has a high degree of certainty, and then the concentration range of the biochemical component is formed into the biochemical component setting information without correction and Added to the biochemical component database.
  • biochemical component setting information of the new biochemical component can be added to the biochemical component database; in this way, it can be continuously improved through
  • the method of biochemical component database provides a knowledge base for the construction of more accurate digital cell models, allowing the constructed digital cell models to more effectively simulate the biochemical processes of biological cells.
  • a machine learning algorithm can be used to predict the concentration of some biochemical components, and biochemical component setting information is formed based on the prediction results and added to the biochemical component database. For example, when the concentration of a specific biochemical component is difficult to directly measure, a biochemical network model related to the biochemical component can be constructed based on existing data, and then a machine learning algorithm can be used to predict the concentration or concentration range of the biochemical component.
  • the biochemical component At in order to calculate the concentration of a specific biochemical component At, can be used as a node to construct a biochemical network model.
  • the biochemical network model also includes node A1, node A2, node A3 and node A4. ; Node A1, node A2, and node A3 represent the biochemical process that generates the specific biochemical component At, and node A4 represents the biochemical process in which the specific biochemical component At participates.
  • new biochemical components and new biochemical processes can be predicted based on known biochemical networks with the help of link prediction techniques.
  • concentration ranges of these predicted biochemical components can be directly verified (such as biological verification, especially through high-throughput cell experiments) or indirectly verified (such as through different biochemical network models) to form a biochemical component design.
  • the information is determined and added to the biochemical component database, thereby improving the biochemical component database, so that the digital cell model built based on the biochemical component database can be closer to real cells.
  • a database parser can be constructed, and the database parser can parse existing data materials to generate biochemical component setting information, for example, generate biochemical component setting information based on an existing database. Or generate biochemical component setting information based on published literature. Further, at least part of the biochemical component setting information constructed by the database parser can be used as the initial biochemical component setting information; the initial biochemical component setting information can be considered as modified or corrected in other ways, as the final available biochemical component setting information. The biochemical component setting information is used to build or update the initial biochemical component pool.
  • the database parser can analyze existing literature to obtain the concentration of a specific biochemical component in different literatures, and collect these concentrations and related information from the literature as the initial biochemical component setting information.
  • Technical experts can make corrections to the concentrations involved in the initial biochemical component setting information, such as deleting some obviously erroneous concentrations or setting a concentration range based on the collected concentrations, to obtain the final usable biochemical component setting information.
  • the current initial biochemical component pool of the initial digital cell model is updated according to the biochemical component database.
  • the type of biochemical component information in the initial biochemical component pool, the concentration of the biochemical component in the biochemical component information can be adjusted according to the biochemical component database, or the biochemical components in the initial biochemical component pool can be adjusted at the same time.
  • the type of information and the concentration of the biochemical component of at least one biochemical component information are adjusted to obtain a new initial biochemical component pool.
  • a new initial biochemical component pool can also be regenerated based on the initial biochemical component pool, and the new initial biochemical component pool can be combined with the initial biochemical component pool before updating.
  • the initial biochemical component pool before updating is replaced with the new initial biochemical component pool, thereby completing the update of the initial biochemical component pool.
  • the biochemical component setting information may also include the concentration search step size of each biochemical component.
  • the biochemical component setting information at least records the concentration range and concentration search step of each biochemical component.
  • the concentration range of the biochemical component of at least one biochemical component setting information can be multiple discrete concentration values; when updating the biochemical component in the initial biochemical component pool When selecting the biochemical component information in the biochemical component, you can select a new concentration value from the concentration range of the biochemical component setting information of the biochemical component.
  • the concentration of the biochemical component in at least one biochemical component setting information is a specific concentration (ie, a point value)
  • the concentration range of the biochemical component is the specific concentration.
  • changes in the concentration of some biochemical components have little impact on the biochemical processes of cells, or their concentration or content is relatively stable.
  • These biochemical components are in the biochemical component database.
  • the concentration range of the biochemical component can be set to a specific concentration, that is, the concentration range of the biochemical component setting information can be set to a specific concentration, so as to reflect some of the biological laws of biological cells and reduce the difficulty of obtaining digital cell models.
  • each biochemical component setting information in the biochemical component database is also marked with a credibility parameter (such as a credibility level), and the credibility parameter is used to characterize the concentration of the biochemical component. reliability.
  • a credibility parameter such as a credibility level
  • the biochemical component setting information has a larger concentration range
  • the concentration of the biochemical component is less reliable
  • the biochemical component setting information has a smaller concentration range
  • the concentration range of the biochemical component is lower. Concentrations of biochemical components are more reliable.
  • the concentration range of the biochemical component setting information comes from highly credible data, such as from biological experimental data, the credibility of the concentration of the biochemical component is higher; when the biochemical component When the concentration range of the component setting information is derived from less reliable data, such as from non-authoritative journals, newspapers, or only predicted values, the credibility of the concentration of the biochemical component is low.
  • high-confidence biochemical component setting information may be given priority, or at least partially based on the credibility of the biochemical component setting information. Extract biochemical component setting information from the biochemical component database.
  • the biochemical component information corresponding to the low-confidence biochemical component setting information can be adjusted first, for example, deleting these biochemical component information ( At least part of the biochemical component information corresponding to the low-confidence biochemical component setting information) or the biochemical component that changes to at least part of the biochemical component information (corresponding to the low-confidence biochemical component setting information) concentration.
  • the method of constructing a digital cell model further includes generating or updating an initial digital cell model based on the signaling pathway database; specifically, generating or updating the initial digital cell model based on the signaling pathway database. signal pathway unit in .
  • the following method can be used to generate or update the signaling pathway units in the initial digital cell model according to the signaling pathway database:
  • each signaling pathway unit of the initial digital cell model is obtained.
  • the signal pathway database includes a plurality of signal pathway information and a biochemical process paradigm pool;
  • the biochemical process paradigm pool includes a plurality of biochemical process paradigms used to describe biochemical process rules;
  • the signal pathway information includes a pool used to describe the signal pathway
  • the biochemical process information of each biochemical process is marked with the process sequence; each biochemical process information includes the biochemical process paradigm referenced by the biochemical process, the meaning of the variables in the referenced biochemical process paradigm, and the referenced biochemical process
  • the initial signal pathway information includes multiple signal pathway information obtained from the signal pathway database, and the biochemical processes in the initial signal pathway information Constants in the biochemical process paradigm to which the information refers are limited to point values.
  • each signaling pathway unit in the initial digital cell model can be obtained according to the signaling pathway database.
  • the signaling pathway units in the initial digital cell model may be updated according to the signaling pathway database.
  • each signaling pathway can be constructed based on existing knowledge, such as knowledge about intracellular signaling pathways in biological or medical literature, that is, each signaling pathway information can be constructed; each biochemical process information in the signaling pathway information By referencing the pool of biochemical process paradigms, it can be transformed into the desired signaling pathway unit. According to the process sequence of each signal path information, the combination method of each signal path unit can be determined, and then each signal path unit can be constructed.
  • the biochemical process paradigm has variables (including independent variables and dependent variables) and constants, as well as mathematical descriptions of mathematical relationships between variables and constants.
  • individual variables and constants are not assigned specific meanings, and constants are not assigned values.
  • the biochemical process paradigm is used only to represent mathematical laws.
  • the meaning of the variables in the biochemical process paradigm referenced by the biochemical process information may refer to the biochemical component information referenced by each variable in the biochemical process information, especially The concentration of the biochemical component in the referenced biochemical component information.
  • biochemical process mathematical models can be generated according to the biochemical process paradigm.
  • the biochemical process mathematical model can simulate the biochemical process defined by the biochemical process information.
  • each variable in the biochemical process paradigm can be replaced by the concentration of each biochemical component defined by the biochemical process information, and a constant can be determined as a constant based on the definition of the value range of the constant in the biochemical process information.
  • the biochemical process paradigm pool and signal pathway information in the signal pathway database by setting up a biochemical process paradigm pool and signal pathway information in the signal pathway database, on the one hand, it can simplify the construction of mathematical models of each biochemical process and reduce the complexity of the signal pathway database; on the other hand, the constructed signal
  • the pathway database can meet the needs of both knowledge expression and computational expression.
  • the description of the meaning of the variables in the biochemical process information and the labels or names of the biochemical process information can use abbreviations commonly used in the field or customized expression rules, thereby making the biochemical process information easier to use. Build and be more easily updated or modified.
  • a biochemical process paradigm of type "mm” includes the following two equations:
  • activators.v_max activators.k_cat*activators.concentration;
  • This biochemical process paradigm can reflect a kinase-catalyzed multimerization process, in which activators, substrate, etc. do not represent specific biochemical components, and k_m, k_cat, etc. do not refer to specific constants.
  • Figure 4 illustrates a biochemical process information and the biochemical process mathematical model that can be generated when the biochemical process information calls the biochemical process paradigm represented by "mm".
  • the mathematical model of the biochemical process describes the reaction rate of the multimerization process catalyzed by the kinase, which is specifically reflected in the concentration change rate v_t of the product. According to the reaction rate of the multimerization process catalyzed by the kinase, combined with the time step, the concentration change of each biochemical component involved after a time step can be determined.
  • initial signal pathway information can be generated based on the signal pathway information.
  • the initial signal pathway information includes a plurality of biochemical process information extracted from the signal pathway information, and the constants in the biochemical process information are in the initial signal pathway information. is a point value rather than a range value; the value of the constant of the biochemical process information in the initial signal pathway information satisfies the range in the signal pathway information.
  • the initial signaling pathway information can be transformed into multiple specific biochemical process mathematical models by calling the biochemical process paradigm pool.
  • This initial signaling pathway information is used to construct each signaling pathway unit of the digital cell model. When the initial digital cell model needs to be updated, the signal pathway unit of the digital cell model can be updated by updating the initial signal pathway information.
  • the biochemical process information in the initial signal pathway information can directly generate the biochemical process equation set by calling the biochemical process paradigm in the biochemical process paradigm pool, and determine the biochemical process equation set to which it belongs based on the process order of the initial signal pathway information. Signal path unit.
  • a biochemical process paradigm may include a set of paradigm equations composed of multiple paradigm equations.
  • Each paradigm equation is used to simulate the change of a variable after a time step, for example, simulating the change of a variable after a time step.
  • Increase or decrease the amount can reflect the role played by the variable; generally, when a variable increases after a time step, the variable is the dependent variable, which is expressed in the biochemical process.
  • the concentration of the reaction product; when a variable decreases after a time step, the variable is an independent variable, that is, it represents the concentration of the reaction substrate in the biochemical process; when a variable remains unchanged after a time step, it represents This variable represents components such as enzymes.
  • the biochemical process information also includes the value range of the constants in the biochemical process paradigm.
  • the value range can be a fixed value (that is, the value range is a point value), or Can be a range value.
  • the certainty of the constant in the biochemical process information when the certainty of the constant in the biochemical process information is very high or very clear, it can be a point value; for example, when the constant in the biochemical process information represents the Michaelis-Menten constant, it can be a point value.
  • the certainty of the constant in the biochemical process information is not very high, it can be configured as a range value.
  • At least one biochemical process is determined based on at least one of medical literature, biological literature, high-throughput cell experiments, machine learning algorithms, and link prediction algorithms, and a biochemical process is generated based on the biochemical process
  • the information is added to the signaling pathway information in the signaling pathway database. In this way, by continuously improving the signaling pathway database, the accuracy of digital cell models can be improved.
  • constants in some biochemical processes can be determined through high-throughput biological experimental technology (such as high-throughput cell experimental technology or high-throughput sequencing technology), and the constants in the biochemical process can be used as The basis for determining the value range of constants in biochemical process information.
  • a machine learning algorithm can be used to determine the constants of some biochemical processes, and the value range of the constants of the corresponding biochemical process information can be determined based on the constants of the biochemical process.
  • the specific biochemical process Bt can be used as a node to construct a biochemical network model.
  • the biochemical network model also includes node B1, node B2, node B3 and node B4; node B1, node B2, and node B3 represent the biochemical component information participating in the biochemical process Bt, and node B4 represents the biochemical component information generated by a specific biochemical process Bt.
  • the concentration data of node B1, node B2, node B3 and node B4 can be determined through machine learning.
  • the mathematical model of the biochemical process of Bt is used to determine the constants of the biochemical process based on the mathematical model of the biochemical process.
  • the type and quantity of biochemical components included in the biochemical component pool are consistent with the type and quantity of biochemical component setting information in the biochemical component database; or, in the signaling pathway database,
  • the biochemical process information or signaling pathway information involved is reflected in the digital cell model.
  • This method can maximize the closeness of digital cell models to biological cells.
  • it is understandable that due to limitations in simulation methods and cognitive knowledge of biological cells, it may be difficult for digital cell models to completely simulate the various functions of biological cells at some specific stages. For example, it may be difficult to simulate the actual functions of biological cells. Multiple biochemical processes performed in sequence are simulated as one biochemical process, in order to simulate as much as possible the final results of these processes rather than the specific processes of each biochemical process.
  • the digital cell model is constructed based on the biochemical component database and the signaling pathway database, but it is not pursued that the digital cell model fully and fully embodies the requirements of the biochemical component database and the signaling pathway database. It collects all kinds of knowledge and information, but pursues the simulation of biological cells with a certain accuracy and function. Of course, as more and more knowledge and information are collected in the biochemical component database and signaling pathway database, the description of the intracellular signaling network becomes more and more refined. Versions of digital cell models will simulate biological cells with increasing accuracy. In other words, in the embodiments of the present disclosure, the biochemical component database and signaling pathway database can be continuously improved, and a more complete digital cell model can be constructed using a more complete biochemical component database and signaling pathway database according to needs.
  • the obtained target digital cell model can be saved as a biochemical component pool and each signaling pathway unit to facilitate direct application of the target digital cell model.
  • the target digital cell model can be saved as a model database, which can include a component subdatabase, a process subdatabase, and a paradigm subdatabase.
  • the molecular database is used to save the information of each biochemical component in the biochemical component pool of the target digital cell model.
  • the process sub-database is used to save various parameters of each biochemical process equation set in the signal pathway unit, that is, the biochemical process equation set is saved in the form of biochemical process information.
  • the paradigm sub-database is used to save each biochemical process paradigm involved in each set of biochemical process equations.
  • the PM of the signaling pathway database can be quoted or copied.
  • the biochemical component information in the molecular database and the biochemical process information in the process sub-database can be directly modified, and then based on the modified molecular database, modified process Subdatabases and paradigm subdatabases construct new digital cell models to simulate new cells.
  • the modified molecular database can be used to construct the biochemical component pool of the new digital cell model; the modified process sub-database can be used to construct each signaling pathway unit of the new digital cell model by referencing the paradigm sub-database.
  • parameters of the signal pathway units may be adjusted first to obtain each target signal pathway unit. Then adjust the parameters of the related target signal pathway units as a whole, for example, adjust the parameters of multiple signal pathway units on the same signal path as a related signal pathway model group, and obtain the target signal pathway model group. . Then, an initial digital cell model is obtained based on the target signaling pathway model set.
  • the method for constructing a digital cell model further includes acquiring a plurality of target signal pathway units; when acquiring an initial digital cell model, determining the initial value based on the multiple target signal pathway units.
  • Signaling pathway units in digital cell models In other words, the parameters of each signal pathway unit can be adjusted first, and then the parameters can be adjusted at the overall level of the digital cell model based on the parameter adjustment results of the signal pathway unit.
  • Obtaining any target signal pathway unit includes:
  • the initial signaling pathway model includes an initial signaling pathway unit and biochemical component information involved in the initial signaling pathway unit, and the biochemical component information includes the concentration of the biochemical component;
  • the screening conditions including the concentration change trend of one or more biochemical components as markers
  • the initial signal pathway model is updated until the filtering conditions are met;
  • the initial signal pathway model is determined to be the target signal pathway model.
  • the method for constructing a digital cell model further includes acquiring multiple target signal pathway model groups; each of the target signal pathway model groups includes multiple signal pathway units; after obtaining the initial digital
  • the signal pathway units in the initial digital cell model are determined based on the signal pathway units in the target signal pathway model group.
  • multiple related (for example, serially connected in signal conduction) signal pathway units can be adjusted as a whole first, and then the overall parameters can be adjusted at the digital cell model level after the parameters are adjusted. In this way, the efficiency of parameter adjustment of the digital cell model can be improved as a whole, and the digital cell model can simulate biological cells more effectively.
  • any target signaling pathway model group including:
  • an initial signal pathway model set which includes a plurality of target signal pathway units and biochemical component information involved in each target signal pathway unit, and the biochemical component information includes the concentration of the biochemical component;
  • the screening conditions including the concentration change trend of one or more biochemical components as markers
  • the initial signal pathway model group is updated until the filtering conditions are met; updating the initial signal pathway model group includes updating the relevant signal pathway model groups. at least one signaling pathway unit;
  • the initial signal pathway model group is determined to be the target signal pathway model group.
  • step S110 after obtaining the initial digital cell model, the legality of the initial digital cell model can also be judged; when the initial digital cell model meets the legality requirements, then Enter step S120 to iteratively simulate the digital cell model.
  • a whitelist strategy or a blacklist strategy can be used to determine the legality of the initial digital cell model.
  • the whitelist strategy refers to setting at least one legality rule in advance; as long as the initial digital cell model satisfies any legality rule, it will be judged that the initial digital cell model meets the legality requirements, otherwise it will be judged that the initial digital cell model does not meet the legality requirements.
  • Legality requirements refers to setting at least one illegality rule in advance; as long as the initial digital cell model satisfies any illegality rule, it will be judged that the initial digital cell model does not meet the legality rules, otherwise it will be judged that the initial digital cell model meets the legality rules. Require.
  • the digital cell model construction method provided by the present disclosure may also include, between step S110 and step S120, after obtaining the initial digital cell model, determining whether the initial digital cell model satisfies any illegality in the illegality rule base. rules; when the initial digital cell model satisfies any illegality rule, it is determined that the initial digital cell model does not meet the legality conditions, and returns to step S110 to reacquire the initial digital cell model; when the initial digital cell model does not satisfy any of the legality conditions When there is an illegality rule, it is determined that the initial digital cell model meets the legality requirements, and step S120 is entered.
  • the illegality rules may include but are not limited to the following rules: at least one biochemical component information in the biochemical component pool is not called by any biochemical process equation group; at least one biochemical process equation group calls the biochemical group The component information is not included in the biochemical component pool; the concentration of at least one biochemical component in the biochemical component pool does not meet the requirements of the biochemical process equation set for the concentration of the biochemical component.
  • the method for constructing a digital cell model provided by the present disclosure may further include constructing an illegality rule library, in which one or more illegality rules are recorded.
  • step S130 the iterative simulation process of the digital cell model can be evaluated to determine the status of the digital cell model during the iterative simulation process, and whether the end condition is reached according to the evaluation result.
  • the status of the digital cell model can be evaluated according to the process evaluation model library, to determine whether the digital cell model has reached a steady state or a failure state, and then to determine whether the iterative simulation process of the digital cell model has ended. condition.
  • step S130 during the iterative simulation process, it is determined whether the digital cell model reaches a steady state or a failure state; when the digital cell model reaches a failure state, Update the initial digital cell model; after the digital cell model reaches a steady state, determine a target digital cell model based on the current initial digital cell model.
  • the method for constructing a digital cell model of the present disclosure may further include obtaining a process evaluation model library, for example, constructing a process evaluation model library.
  • the concentration of the biochemical component as a marker needs to be monitored or data processed. It is understood that in order to achieve different purposes, the markers in different processes can be different.
  • the biochemical components used as markers in each process can be obtained based on existing knowledge, for example, by searching biological literature or medical literature to determine which biochemical components can be used as markers reflecting cell apoptosis, and which biochemical indicators can As a marker reflecting cell division.
  • a process evaluation model library includes a first process evaluation model that includes legal concentration ranges for a plurality of biochemical components.
  • step S120 during the iterative simulation process of the digital cell model, the biochemical component pool is evaluated using the first process evaluation model.
  • the evaluation finds that the concentration of one of the biochemical components in the pool or some biochemical components used as markers exceeds the legal concentration range of the biochemical components, the digital cell model is determined to be in a failed state.
  • the first process evaluation model defines that the concentration of a specific biochemical component is not less than 0; if the concentration of the specific biochemical component in a certain updated biochemical component pool is a negative value, then determine the number The cell model is in a failed state.
  • the first process evaluation model can be used to evaluate each updated biochemical component pool, so that when the concentration of the biochemical component in the updated biochemical component pool does not meet the legal concentration range, a timely determination can be made.
  • the advancement of the digital cell model is terminated due to failure.
  • the initial digital cell model is updated in a timely manner to reduce the computational load of obtaining the digital cell model and improve the efficiency of obtaining the digital cell model.
  • sampling evaluation can also be performed on each updated biochemical component pool, for example, an evaluation is performed every 3 to 10 times when the biochemical component pool is updated.
  • the legal concentration range of the biochemical component may be different from the concentration of the biochemical component in the initial biochemical component pool, or may be different from the biochemical component defined in the biochemical component database.
  • the concentration range is different.
  • the limits on the concentration of biochemical components in the initial biochemical component pool and biochemical component database are limits on the starting conditions of the initial digital cell model, rather than on the biochemical components in the iterative simulation process of the digital cell model. Limitation of concentration changes.
  • the legal concentration range of biochemical components is to limit the concentration changes of biochemical components during the iterative simulation process of the digital cell model, so that when the concentration of biochemical components is obviously inconsistent with biological knowledge, Under certain circumstances (such as the occurrence of negative values, such as the occurrence of extreme high concentrations), it is judged that the current iterative simulation process of the digital cell model is obviously inconsistent with biological laws, and then the current iterative simulation of the digital cell model is terminated and the current initial digital cell is abandoned. Model.
  • the process evaluation model library includes a second process evaluation model, and the second process evaluation model includes an upper limit of the number of iterations of the digital cell model iterative simulation.
  • the number of iterative simulations of the digital cell model can be evaluated according to the second process evaluation model; when the number of iterative simulations of the digital cell model reaches the upper limit of the number of iterations, it is determined that the digital cell model is in a failed state.
  • the second process evaluation model limits the upper limit of the number of iterations of the digital cell model iterative simulation to 30,000 times; when the number of iterative simulations of the digital cell model reaches 30,000 times, the digital cell model is determined to be in a failed state.
  • the second process evaluation model can be used to evaluate the current number of iterations.
  • the evaluation can also be performed before each start of solving the first signal path unit, or the evaluation can be performed at other times.
  • the process evaluation model library includes a third process evaluation model
  • the third process evaluation model includes a concentration change trend of at least one biochemical component as a marker during an iterative simulation process, For example, the concentration gradually increases, the concentration gradually decreases, the concentration fluctuates, and the concentration stabilizes at the plateau stage after rising.
  • Assessing the status of digital cell models against the process assessment model library includes:
  • the historical data of the biochemical component pool is evaluated according to the third process evaluation model; if the historical data of the biochemical component pool does not satisfy the third process evaluation model, the digital cell The model reaches a failed state.
  • the third process evaluation model can also detect whether there is a mutation (sudden change) in the concentration change process of the biochemical component as a marker. Specifically, it detects one or more selected historical data in the biochemical component pool. Whether there is a concentration mutation that exceeds the mutation threshold for each biochemical component, and the concentration mutation that exceeds the preset threshold is regarded as a sufficient condition for not passing the screening evaluation (reaching a failure state). In this way, the initial digital cell model that passes the screening has overall continuity during the iterative simulation process.
  • a mutation den change
  • the process evaluation model library includes a fourth process evaluation model
  • the fourth process evaluation model includes changes in the concentration relationship between a plurality of biochemical components as markers during the iterative simulation process.
  • Trend for example, the concentration of some markers gradually increases while the concentration of some markers gradually decreases.
  • Assessing the status of digital cell models against the process assessment model library includes:
  • the historical data of the biochemical component pool is evaluated according to the fourth process evaluation model; if the historical data of the biochemical component pool does not satisfy the fourth process evaluation model, the digital cell The model reaches a failed state.
  • the process evaluation model library includes a fifth process evaluation model, the fifth process evaluation model includes at least one cell type model, each of the cell type models includes a component related to a concentration of a biochemical component.
  • the reference range of cell phenotypic index is included in the process evaluation model library.
  • core phenotypes can be extracted based on published literature. Specifically, cell phenotype parameters corresponding to these core phenotypes are obtained. Illustratively, these cell phenotype parameter reports are not limited to survival phenotype parameters, proliferation phenotype parameters, apoptosis phenotype parameters, migration phenotype parameters, invasion phenotype parameters, clonogenic phenotype parameters, and autophagy phenotype parameters. , angiogenesis phenotypic parameters, epithelial cell to mesenchymal transition phenotypic parameters, etc. Based on the combination of these cell phenotypic parameters, cell phenotypic parameters for characterizing the cell type can be obtained.
  • Assessing the status of digital cell models against the process assessment model library includes:
  • the digital cell model evaluates whether the digital cell model satisfies at least one cell type model after each simulation according to the fifth process evaluation model; if the digital cell model satisfies at least one cell type model during multiple consecutive iterative simulations, If the same cell type model is satisfied, the digital cell model is in a steady state.
  • evaluating whether the digital cell model satisfies at least one cell type model after any simulation includes: determining the simulated cell phenotype index based on the simulated biochemical component pool; based on the cell phenotype index and each cell type model The cell phenotypic index reference range is determined to determine the cell type model that the cell phenotypic index satisfies.
  • the cell phenotype index may include multiple cell phenotype parameters, any one of which is related to the concentration of a biochemical component as a marker (for example, related to an increase or decrease in concentration).
  • the reference range of any one of the cell phenotype indexes includes the reference ranges of multiple cell phenotype parameters.
  • the cell phenotype parameters include survival phenotype parameters, proliferation phenotype parameters, apoptosis phenotype parameters, migration phenotype parameters, invasion phenotype parameters, clonogenic phenotype parameters, autophagy phenotype parameters, blood vessels Generate a plurality of phenotypic parameters and epithelial-to-mesenchymal transition phenotypic parameters.
  • the markers used for each cell phenotype parameter can be obtained from medical or biological literature. It can be understood that with the accumulation of biological knowledge and the deepening of the understanding of cells, new cell phenotype parameters can also be constructed to be applied to the cell phenotype index of the embodiments of the present disclosure.
  • the fifth process evaluation model may include only one cell type model having a cell phenotype index reference range of the target cell type.
  • the digital cell model can represent the target cell type.
  • the digital cell model presents the target cell type in multiple iterations, for example, when the cell phenotypic index in multiple consecutive iterations is basically the same and meets the cell phenotypic index reference range, the digital cell model reaches a steady state.
  • the fifth process evaluation model includes a cell type model for simulating normal cells.
  • the cell phenotypic index of the biochemical component pool when the digital cell model reaches steady state can meet the above reference range of cell phenotypic index, then the digital cell model will appear as normal cells, specifically wild type, when it reaches steady state. (wildtype) normal surviving cells.
  • the fifth process evaluation model includes a plurality of different cell type models, and the different cell type models have cell phenotype index reference ranges of different cell phenotypes.
  • the digital cell model can satisfy any one cell type model in consecutive iterations, the digital cell model reaches a steady state. In other words, the evaluation process is goalless. In this way, the initial digital cell model screened by the cell type model can present or achieve one of the cell types during the iterative simulation process. In subsequent applications, the application scenario of the obtained digital cell model can be determined based on the specific cell type model passed by the initial digital cell model.
  • the fifth process evaluation model may include a plurality of different cell type models, and the different cell type models have different cell phenotype index reference ranges; wherein a specific Cell type models are marked as target cell type models.
  • each cell type model is used to evaluate the biochemical component pool after each simulation separately.
  • the digital cell model satisfies the target cell type model during multiple consecutive iterations of simulation, the digital cell model is in a steady state.
  • the digital cell model satisfies the same cell type model other than the target cell type model during multiple consecutive iterations of simulation, a cell type label is added to the current initial digital cell model and can be saved as a backup digital cell model.
  • the application scenario of the backup digital cell model can be determined based on the cell types that the backup digital cell model can achieve.
  • the digital cell model construction method of the present disclosure can not only obtain the target digital cell model, but also obtain backup digital cell models suitable for other application scenarios, or can speed up the construction speed of digital cell models suitable for other application scenarios. .
  • determining the target digital cell model based on the current initial digital cell model includes:
  • the current initial digital cell model is determined as the target digital cell model.
  • determining the target digital cell model based on the current initial digital cell model includes:
  • the current initial digital cell model is determined as the candidate digital cell model
  • the candidate digital cell model is verified and evaluated according to the verification model library; when the candidate digital cell model does not satisfy the verification model library, the initial digital cell model is updated.
  • the candidate digital cell model it can be judged whether the candidate digital cell model can achieve specific functions, or whether it meets the requirements in terms of simulation precision and accuracy, so that the candidate digital cell model has better performance. Apply effects.
  • the verification model library includes a mutation intervention model, and the mutation intervention model includes at least one mutation intervention sub-model; any mutation intervention sub-model includes mutation information, exogenous information and Intervention result information;
  • the mutation information includes at least one component mutation information and at least one biochemical process change information;
  • the component mutation information is used to describe the changes in the biochemical component pool caused by mutations in the cell;
  • the biochemical process change information is used Used to describe changes in biochemical processes caused by intracellular mutations;
  • the exogenous information includes at least one exogenous component information and at least one exogenous component-related process equation set;
  • the exogenous component information includes exogenous components concentration of the component;
  • the exogenous component-related process equation set is used to simulate the concentration change of the biochemical component after one time step caused by the addition of the exogenous component;
  • the intervention result information includes a classification label and at least one The reference range of the intervention evaluation index, which is related to the concentration change of the biochemical component of the marker;
  • Verification and evaluation of the candidate digital cell model according to the verification model library includes: verification and evaluation of the candidate digital cell model according to the mutation intervention model; verification and evaluation of the candidate digital cell model according to the mutation intervention model includes: At least one of the mutation intervention sub-models performs verification and evaluation on the candidate digital cell model; performing verification and evaluation on the candidate digital cell model according to any one of the mutation intervention sub-models includes:
  • the digital mutant cell model performs iterative simulation to obtain a biochemical component pool of the digital mutant cell model that reaches a steady state as mutation steady-state data, and uses the digital mutant cell model that reaches a steady state as a steady-state digital mutant cell model. ;
  • the mutation intervention digital cell model performs iterative simulation to obtain a biochemical component pool of the mutation intervention digital cell model that reaches a steady state as mutation intervention data;
  • the mutation intervention model can evaluate the ability of the candidate digital cell model to construct a digital mutant cell model and its ability to return to normal cell types in response to external intervention.
  • This ability reflects the pathogenesis and treatment process of some diseases (such as tumors), that is, normal cells mutate into tumor cells as mutated cells, and drugs are used to intervene in tumor cells to achieve treatment of tumors.
  • the candidate digital cell model can pass the evaluation of the mutation intervention model, the candidate digital cell model can construct a personalized digital mutation cell model from the patient's individualized tumor cells and conduct personalized anti-tumor drug evaluation, thereby improving treatment effects and speeds up the treatment process.
  • mutation information exogenous information and intervention result information corresponding to the disease for evaluation
  • the evaluated candidate digital cell models also have good application prospects in the treatment of other diseases.
  • a drug refers to a chemical component or combination of chemical substances that has a therapeutic effect or intervention effect on a disease.
  • drugs can have different classifications for different diseases.
  • drugs can include but are not limited to targeted anti-tumor drugs, cytotoxic drugs, cellular immune modulators, cell epigenetic modulators, etc. It can be understood that in the embodiments of the present disclosure, not all drugs are required to be administratively approved or have sufficient efficacy.
  • Some chemical components or compositions that have pharmaceutical potential or have the possibility of becoming drugs can be As drugs in embodiments of the present disclosure.
  • the drug used may be brand new and not clinically proven or administratively approved.
  • a potentially useful drug may be used.
  • specific verification is performed based on the digital cell model provided by the embodiment of the present disclosure to determine the possible efficacy of the compound in treating the specific disease, and then promote or accelerate the development or screening of therapeutic drugs for the disease. check.
  • the mutation information is constructed from actual mutated cells, for example from tumor cells.
  • mutation information can be constructed through multi-omics data of mutated cells.
  • the mutation information may be a multi-omics database of mutated cells.
  • the method for constructing a digital cell model further includes:
  • the biochemical component information in the specific type of mutant cells or the biochemical component information of the mutant cells involved in the clinical process is used to determine the component mutation information in the mutation information;
  • the in vitro drug susceptibility test The drug information used or the drug information involved in the clinical process is used to determine the exogenous component information in the intervention result information;
  • the results of the in vitro drug susceptibility test or the clinical process are used to Determine the classification label (such as a valid label or an invalid label) of the intervention result information.
  • the specific type of mutated cells are tumor cells; the drug information is drug information targeting anti-tumor drugs.
  • the digital mutant cell model when the digital mutant cell model presents the same cell type in multiple consecutive simulations, the digital mutant cell model reaches a steady state, and the current digital mutant cell model The mutant cell model is used as the steady-state digital mutant cell model, and the current pool of biochemical components of the digital mutant cell model is used as the mutant steady-state data.
  • the mutation intervention digital cell model when the mutation intervention digital cell model presents the same cell type in multiple consecutive simulations, the mutation intervention digital cell model reaches a steady state, and the current The pool of biochemical components of the mutational intervention digital cell model is used as mutational intervention data.
  • constructing a mutation intervention digital cell model based on the steady-state digital mutation cell model and the exogenous information includes:
  • the external information may be drug sample information.
  • drug sample information can be constructed from a digital drug library and actual sample information.
  • the digital drug library includes data sets of different drugs; the data set of each drug is based on the drug target, approved indications, safe dosage range, metabolic kinetic parameters, activity parameters, and side effects. At least one of them is built.
  • a drug sample information can be formed based on the drugs used in the actual samples and their concentrations or dosages, as well as the data sets of drugs in the digital drug library, as exogenous information for the mutation intervention sub-model.
  • the intervention evaluation index includes a phenotypic reversal score
  • determining the intervention evaluation indicators includes:
  • the candidate digital cell model fails the validation evaluation of the mutation intervention sub-model. In this way, the candidate digital cell model evaluated by the mutation intervention sub-model has a high degree of fit when simulating mutant cells and the response of mutant cells to drugs.
  • the embodiment of the present disclosure also provides a digital cell model construction device UA, which includes:
  • Building module UA1 is configured to construct an initial digital cell model based on biochemical information;
  • the digital cell model includes a biochemical component pool and multiple signaling pathway units;
  • the biochemical component pool includes multiple biochemical component information, so
  • the biochemical component information includes the concentration and/or location information of the biochemical component;
  • the signal pathway unit is used to simulate the signal pathway of biological cells;
  • the signal pathway unit includes at least one biochemical reaction module, and the biochemical reaction module is used to Use a set of biochemical process equations to simulate the biochemical processes occurring in the signaling pathway unit;
  • the simulation module UA2 is configured to iteratively simulate the initial digital cell model to simulate the biochemical processes occurring in biological cells;
  • the judgment module UA3 is configured to determine a target digital cell model based on the current initial digital cell model after the digital cell model reaches a steady state.
  • the judgment module UA3 is further configured to judge whether the digital cell model reaches a steady state or a failure state; and updates the initial number when the digital cell model reaches a failure state.
  • Cell model after the digital cell model reaches a steady state, determine the target digital cell model based on the current initial digital cell model.
  • the judgment module UA3 is also configured to evaluate the status of the digital cell model according to the process evaluation model library, and judge whether the digital cell model has reached a steady state or a failure state.
  • the digital cell model construction device UA further includes a verification module UA4, which is configured to: after the digital cell model reaches a steady state during the iterative simulation process, the current The initial digital cell model is determined as the candidate digital cell model; the candidate digital cell model is verified and evaluated according to the verification model library; when the candidate digital cell model does not satisfy the verification model library, the initial digital cell model is updated.
  • a verification module UA4 which is configured to: after the digital cell model reaches a steady state during the iterative simulation process, the current The initial digital cell model is determined as the candidate digital cell model; the candidate digital cell model is verified and evaluated according to the verification model library; when the candidate digital cell model does not satisfy the verification model library, the initial digital cell model is updated.
  • inventions of the present disclosure also provide a digital cell system.
  • the digital cell system includes the above-mentioned digital cell model construction device UA, and also includes:
  • Digital cell engine M1 used to run digital cell models
  • the data analysis engine M2 is used to construct a multi-omics database of mutant cells based on the multi-omics data of mutant cells;
  • Digital drug library M3 used to provide drug information
  • Data mapping engine M4 used to map drug information in the multi-omics database and/or the digital drug library to the digital cell model
  • the drug efficacy analysis engine M5 is used to predict drug efficacy based on the running results of the digital cell engine.
  • the digital cell model building device UA of the digital cell system can acquire a digital cell model, for example, acquire a normal digital cell model of normal cells, and the digital cell model can be run in the digital cell engine M1.
  • the data analysis engine M2 can receive multi-omics data of mutant cells and build a multi-omics database of mutant cells.
  • the data parsing engine M2 can receive multi-omics data of tumor cells of a tumor patient, and construct a personalized multi-omics database of tumor cells of the tumor patient based on the multi-omics data.
  • the data mapping engine M4 can map the multi-omics database constructed by the data analysis engine M2 to the normal digital cell model that has reached a steady state.
  • the data mapping engine M4 can map the drug information in the digital drug library M3 to the digital mutant cell model that has reached the steady state.
  • the mutated cell model of digital drug intervention is obtained; after the mutated cell model of digital drug intervention is run in the digital cell engine M1 and reaches a steady state, the drug efficacy analysis engine M5 determines the effect of the drug on the mutated cells based on the running results of the digital cell engine M1. treatment effect.
  • the simulation module of the digital cell model construction device UA may be the same module as the data analysis engine M2.
  • an electronic device capable of implementing the above-mentioned construction method of a digital cell model is also provided.
  • FIG. 9 An electronic device 1000 according to this embodiment of the present disclosure is described below with reference to FIG. 10 .
  • the electronic device 1000 shown in FIG. 9 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.
  • electronic device 1000 is embodied in the form of a general computing device.
  • the components of the electronic device 1000 may include, but are not limited to: the above-mentioned at least one processing unit 1010, the above-mentioned at least one storage unit 1020, a bus 1030 connecting different system components (including the storage unit 1020 and the processing unit 1010), and the display unit 1040.
  • the storage unit stores program code, and the program code can be executed by the processing unit 1010, so that the processing unit 1010 performs various exemplary methods according to the present disclosure described in the "Example Method" section of this specification. Implementation steps.
  • the storage unit 1020 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 10201 and/or a cache storage unit 10202, and may further include a read-only storage unit (ROM) 10203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • Storage unit 1020 may also include a program/utility 10204 having a set of (at least one) program modules 10205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples, or some combination, may include the implementation of a network environment.
  • program/utility 10204 having a set of (at least one) program modules 10205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples, or some combination, may include the implementation of a network environment.
  • Bus 1030 may be a local area representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or using any of a variety of bus structures. bus.
  • Electronic device 1000 may also communicate with one or more external devices 1100 (e.g., keyboard, pointing device, Bluetooth device, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 1000, and/or with Any device that enables the electronic device 1000 to communicate with one or more other computing devices (eg, router, modem, etc.). This communication may occur through input/output (I/O) interface 1050.
  • the electronic device 1000 may also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 1060. As shown, network adapter 1060 communicates with other modules of electronic device 1000 via bus 1030.
  • network adapter 1060 communicates with other modules of electronic device 1000 via bus 1030.
  • the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, a network device, etc.) to execute a method according to an embodiment of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, a network device, etc.
  • a computer-readable storage medium is also provided, on which a program product capable of implementing the method described above in this specification is stored.
  • various aspects of the present disclosure can also be implemented in the form of a program product, which includes program code.
  • the program product is run on a terminal device, the program code is used to cause the The terminal device performs the steps according to various exemplary embodiments of the present disclosure described in the above "Example Method" section of this specification.
  • the program product for implementing the above method according to an embodiment of the present disclosure may adopt a portable compact disk read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer.
  • a readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, apparatus, or device.
  • the program product may take the form of any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a readable signal medium may also be any readable medium other than a readable storage medium that can send, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for performing operations of the present disclosure may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., as well as conventional procedural Programming language—such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., provided by an Internet service). (business comes via Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service e.g., provided by an Internet service

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

提供一种数字细胞模型的构建方法及装置、介质、设备、系统。数字细胞模型的构建方法包括:基于生物化学信息构建初始的数字细胞模型(S110);对初始的数字细胞模型进行迭代仿真,以模拟生物细胞中发生的生化过程(S120);在数字细胞模型达到稳态状态后,根据当前的初始的数字细胞模型确定目标数字细胞模型(S130)。

Description

数字细胞模型的构建方法及装置、介质、设备、系统
交叉引用
本公开要求于2022年5月31日提交的申请号为202210613650.8、名称为“数字细胞模型的构建方法及装置、介质、设备、系统”的中国专利申请和于2022年5月31日提交的申请号为202210616258.9、名称为“数字细胞模型的构建方法及装置、介质、设备、系统”的中国专利申请的优先权,该中国专利申请的全部内容通过引用全部并入本文。
技术领域
本公开实施例涉及数字细胞模型技术领域,具体而言,涉及一种数字细胞模型的构建方法及其装置、存储介质、电子设备、数字细胞系统。
背景技术
疾病的发生和发展往往涉及复杂的生化反应,尤其是涉及多条信号通路,这些信号通路的相互叠加产生对细胞表型的影响。当前,对疾病的发生、发展、治疗的研究主要是采用生物学手段,其受到生物学手段周期长、影响因素多、成本高的制约。
随着当前累积的生物学信息的增多,尤其是关于信号通路相关信息、蛋白网络相关信息、基因网络相关信息、生物标志物相关信息、与生化过程相关的动力学信息等的累积,有必要构建数字细胞模型,以提高对疾病的发生、发展和治疗的研究效率。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开的目的在于克服上述现有技术的不足,提供一种数字细胞模型的构建方法及其装置、存储介质、电子设备、数字细胞系统。
根据本公开的第一个方面,提供一种数字细胞模型的构建方法,包括:
基于生物化学信息构建初始的数字细胞模型;所述数字细胞模型包括生化组分池和多个信号通路单元;所述生化组分池包括多个生化组分信息,所述生化组分信息包括生化组分的浓度和/或位置信息;所述信号通路单元用于模拟生物细胞的信号通路;所述信号通路单元包括至少一个生化反应模块,所述生化反应模块用于利用生化过程方程组模拟所述信号通路单元中发生的生化过程;
对初始的数字细胞模型进行迭代仿真,以模拟生物细胞中发生的生化过程;
在所述数字细胞模型达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型。
根据本公开的第二个方面,提供一种数字细胞模型的构建装置,包括:
构建模块,被配置为基于生物化学信息构建初始的数字细胞模型;所述数字细胞模型包括生化组分池和多个信号通路单元;所述生化组分池包括多个生化组分信息,所述生化组分信息包括生化组分的浓度和/或位置信息;所述信号通路单元用于模拟生物细胞的信号通路;所述信号通路单元包括至少一个生化反应模块,所述生化反应模块用于利用生化过程方程组模拟所述信号通路单元中发生的生化过程;
仿真模块,被配置为对初始的数字细胞模型进行迭代仿真,以模拟生物细胞中发生的生化过程;
判断模块,被配置为在所述数字细胞模型达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型。
根据本公开的第三个方面,提供一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现上述的数字细胞模型的构建方法。
根据本公开的第四个方面,提供一种电子设备,包括:
处理器;以及
存储器,用于存储所述处理器的可执行指令;
其中,所述处理器配置为经由执行所述可执行指令来执行上述的数字细胞模型的构建方法。
根据本公开的第五个方面,提供一种数字细胞系统,包括上述的数字细胞模型的构建装置,还包括:
数字细胞引擎,用于运行数字细胞模型;
数据解析引擎,用于根据突变细胞的多组学数据构建突变细胞的多组学数据库;
数字药物库,用于提供药物信息;
数据映射引擎,用于将所述多组学数据库和/或所述数字药物库中的药物信息映射至所述数字细胞模型;
药效分析引擎,用于根据数字细胞引擎的运行结果预测药物疗效。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅 仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示意性示出根据本公开示例实施例的一种数字细胞模型的构建方法的流程示意图。
图2示意性示出根据本公开示例实施例的一种数字细胞模型的运行逻辑架构图。
图3示意性示出根据本公开示例实施例的一种数字细胞模型的构建方法的流程示意图。
图4示意性示出根据本公开示例实施例的一种细胞表型指数构建方法的流程示意图。
图5示意性示出根据本公开示例实施例的一种生化过程定义信息。
图6示意性示出根据本公开示例实施例的根据机器学习算法获取生化组分信息的示意图。
图7示意性示出根据本公开示例实施例的根据机器学习算法获取生化过程数学模型的示意图。
图8示意性示出根据本公开示例实施例的一种数字细胞模型的构建装置的结构示意图。
图9示意性示出根据本公开示例实施例的一种数字细胞系统的结构示意图。
图10示意性示出根据本公开示例实施例的一种电子设备结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
本公开实施方式提供一种数字细胞模型的构建方法,参见图1,该数字细胞模型 的构建方法可以至少包括如下步骤:
步骤S110,基于生物化学信息构建初始的数字细胞模型;所述数字细胞模型包括生化组分池和多个信号通路单元;所述生化组分池包括多个生化组分信息,所述生化组分信息包括生化组分的浓度和/或位置信息;所述信号通路单元用于模拟生物细胞的信号通路;所述信号通路单元包括至少一个生化反应模块,所述生化反应模块用于利用生化过程方程组模拟所述信号通路单元中发生的生化过程;
步骤S120,对初始的数字细胞模型进行迭代仿真,以模拟生物细胞中发生的生化过程;
步骤S130,在所述数字细胞模型达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型。
在本公开实施方式中,生化组分信息是指参与生化过程的各个生化组分的信息,这些生化组分在生化过程中可以作为反应底物、反应产物、催化剂、载体或者以其他角色参与生化过程。从化学的角度,生化组分信息的类型可以包括但不限于单体蛋白、多聚体蛋白、变异蛋白、糖蛋白、多肽、氨基酸、DNA、RNA、核苷酸、多糖等各类参与生化过程的组分。在本公开的实施方式中,生化组分信息至少包括生化组分的浓度。在一些实施方式中,生化组分信息还可以包括生化组分的位置信息。在一些实施方式中,不同的生化组分可以被分配不同的ID或者名称,例如被分配不同的编号,生化组分被分配的ID或者名称用于实现不同生化组分的区分。生化组分的ID或者名称也可以被添加至生化组分信息中。
在本公开的一些实施方式中,在不同的生化过程中,同一生化组分可以扮演不同的角色。举例而言,在一个酶促反应过程(一种示例的生化反应过程)一个蛋白质可以作为该酶促反应过程中的酶;在一个蛋白质合成过程(另一种示例的生化反应过程)中,上述作为酶的蛋白质可以在该过程中被合成,因而作为该蛋白质合成过程中的反应产物。
在本公开的一些实施方式中,同一种物质(例如具有化学结构的物质)可以因所处的位置或者所具有的一些差异(例如电荷特性差异、解离程度差异、空间特性差异等)而被作为两个不同的生化组分,例如分别赋予两个不同的编号。换言之,在一些实施方式中,至少两个生化组分为同一化学物质且分别位于细胞内的不同位置。举例而言,在一个蛋白质合成过程中,被合成的蛋白质可以作为一种生化组分;在一个蛋白质靶向传输过程中,上述被合成的蛋白质可以被传输至特定地点以发挥其功能;在该示例中,可以将被合成出来的蛋白质(亦是被靶向传输前的蛋白质)作为一个生化组分,而将被靶向传输至特定地点的蛋白质作为另一个生化组分,尽管这两个生化组分在化学结构上相同或者相似。
在本公开的实施方式中,生化组分池中记录有各个生化组分的浓度;生化组分的浓度可以是质量浓度(例如g/mL、μg/mL等)、摩尔浓度(例如μmol/L、mol/L等) 或者采用其他表征浓度的单位。在一种示例中,生化组分的浓度可以的单位可以是μmol/L。
在本公开的实施方式中,生化过程指的是生化组分的变化过程,该变化过程可以是组分(化学物质)本身的变化过程(例如从一种化学物质变为另一种化学物质),也可以是指组分的空间分布的变化过程(例如化学物质在细胞内的转运过程),或者其他导致化学物质的种类、分布、功能等发生变化的过程。在本公开的一种示例中,生化过程至少包括生化反应过程和生化转运过程。
生化过程会导致所涉及的至少部分生化组分的浓度发生变化。举例而言,在一个生化反应过程中,作为反应底物的生化组分可以被消耗,作为反应产物的生化组分可以被生成;仅考虑该生化反应过程的情况下,该生化过程会使得反应底物的浓度下降而导致反应产物的浓度上升。再例如,在一个生化转运过程中,转运前的生化组分可以在生化转运过程中被消耗,转运后的生化组分可以在生化转运过程中被生成;仅考虑该生化转运过程的情况下,该生化过程会使得转运前的生化组分的浓度下降而导致转运后的生化组分的浓度上升。
在本公开的实施方式中,可以根据生化过程构建用于模拟生化过程的生化反应模块,该生化反应模块可以包括生化过程方程组。换言之,生化反应模块用于利用生化过程方程组模拟所述信号通路单元中发生的生化过程。在一种示例中,生化过程方程组可以用于模拟生化过程导致的生化组分在一个时间步长后的浓度变化。同一生化过程方程组可以包括多个浓度变化函数,每个浓度变化函数用于描述该生化过程中的一种生化组分的浓度变化,生化过程中的每一个生化组分的浓度变化均被一个浓度变化函数所表示。这样该生化过程方程组中的各个生化组分的浓度变化函数本身、浓度变化函数之间的相互关系,可以反映出该生化过程中各个生化组分的角色(例如反应底物、反应产物、酶等)、生化过程中的质量守恒、生化过程中的信号依赖等。
示例性地,如果一个生化过程为在酶催化下进行二聚反应,则模拟该生化过程的生化过程方程组至少包括:
当c(cat)≠0时,
Δc(cat)=0,Δc(Ant)=-2v(Product)*t,Δc(Product)=v(Product)*t;
在上述示例的生化过程方程组中,c(cat)表示生化过程中的酶的浓度;酶的浓度不为零时,才进行后续的求解,这体现出了该生化过程对酶的依赖性;在生化过程前后,c(cat)的浓度变化为0,表示酶在生化过程中起到催化作用而自身不被消耗。Δc(Ant)表示生化过程中反应底物的浓度在一个时间步长t后的浓度变化量。Δc(Product)表示生化过程中反应产物的浓度在一个时间步长t后的浓度变化量,v(Product)表示在生化过程中反应产物的浓度的变化速度,t表示一个时间步长。
细胞内的信号传导等各个生化过程可以至少部分被划分为不同的信号通路,各个信号通路之间关联以形成细胞内的信号网络。在本公开中,可以通过查找医学文献、 生物学文献、进行链路预测或者通过其他方法,获得所需的各个信号通路;然后构建出用于描述信号通路的信号通路单元。
在本公开的一种实施方式中,信号通路包括一个或者多个生化过程。可以使用描述生化过程的生化过程方程组,来构建描述信号通路的信号通路单元。这样,根据信号通路单元中的生化过程方程组,可以获得信号通路单元所描述的信号通路中的各个生化过程以及各个生化过程所涉及的生化组分。可选的,按照信号通路单元所模拟的信号通路类型,信号通路单元至少可以分为信号传导模块、蛋白转运模块、细胞周期模块、细胞程序化死亡模块和基因表达模块。换言之,本公开实施方式中,信号通路单元用于模拟细胞内信号传导、蛋白转运、细胞周期、细胞程序化死亡中的至少一个,或者用于模拟细胞内基因表达。可以理解的是,本公开实施方式的信号通路单元还可以用于模拟其他的生物学细胞内的信号过程或者生物学过程。
可选的,信号通路单元中的生化反应模块在模拟细胞内基因表达或者蛋白转运等过程时,所采用的数学范式可以与模拟信号传导、细胞程序化死亡、细胞周期等过程不同。
在本公开的一种实施方式中,可以构建可视化引擎,可视化引擎通过对各个信号通路单元中的各个生化反应模块进行解析,进而获得各个生化过程之间的关系,以及获得各个作为过程节点的生化过程与作为组分节点的生化组分之间的关系。通过输出或者展示这些关系,可以输出或者展示该数字细胞模型所模拟的细胞内信号网络。
在本公开的一种实施方式中,可以根据可视化引擎,将数字细胞模型所模拟的细胞内的过程节点之间的关系和过程节点与组分节点之间的关系以图的形式呈现出来,尤其是可以呈现为细胞信号网络图。这样,可以直观、完整且清晰的呈现数字细胞模型所模拟的细胞内的各种过程。进一步的,在数字细胞模型迭代仿真过程中,该可视化引擎还可以动态的呈现至少部分生化组分的变化状态。举例而言,根据组分节点(即一个生化组分)可以在浓度增大或者减小时分别被显示为不同的颜色;例如,如果一个组分节点的浓度增大,则细胞信号网络图中该组分节点显示为红色;如果一个组分节点的浓度减小,则细胞信号网络图中该组分节点显示为绿色。再举例而言,还可以根据组分节点(即一个生化组分)的浓度增大量或者减小量,而使得组分节点被显示为不同的体积。例如,组分节点的浓度变化越大,则该组分节点的呈现的点的体积越大;组分节点的浓度变化越小,则该组分节点的呈现的点的体积越小。示例性的,如果一个组分节点呈现为较小的绿色点,表示该组分节点在迭代中浓度小幅下降。
当然的,可视化引擎还可以呈现单个或者多个生化组分的浓度在迭代仿真过程中的变化,例如显示选定的生化组分的实时浓度,或者绘制选定的生化组分的浓度变化曲线等。
在本公开的一些实施方式中,参见图2,数字细胞模型所模拟的部分信号通路是并行的,即两者之间并无明显的先后顺序或者时间上的依赖关系。在这种情况下,可 以将这些并行的信号通路作为一个信号通路组。相应的,数字细胞模型中将描述这些并行的信号通路的信号通路单元作为并行的信号通路单元,被标记有相同的单元过程次序。在逻辑架构上,可以将这些具有相同单元过程次序的信号通路单元作为一个信号通路单元组,不同单元过程次序的信号通路单元组依次级联。当然的,在本公开实施方式的数字细胞模型中,可以不存在信号通路单元组以及信号通路单元组之间的级联,这种划分和级联仅仅是为了描述本公开实施方式的数字细胞模型的信号通路单元之间的运行逻辑关系,而非对数字细胞模型本身的架构进行限定。可以理解的是,一个信号通路单元组中可以包括一个信号通路单元,也可以包括多个不同的信号通路单元。在本公开实施方式中,各个信号通路单元可以被标记有过程次序,具有相同生化过程次序的生化过程方程组可以形成信号通路单元组。
举例而言,参见图2,数字细胞模型在逻辑上,可以包括多个依次级联的信号通路单元组,依次为信号通路单元组A、信号通路单元组B······信号通路单元组M。在该示例中,信号通路单元组M用于表示最后一级信号通路单元组。其中,在进行仿真时,依次求解各个信号通路单元组,以模拟生物学细胞的生化过程在整体上的有序性和信号传导的方向性。在任意一个信号通路单元组中,可以具有一个信号通路单元,也可以具有多个信号通路单元组。例如,该示例的信号通路单元组A具有一个信号通路单元,例如包括信号通路单元a1、信号通路单元a2、信号通路单元a3。再例如,该示例性的信号通路单元组B具有一个信号通路单元,即信号通路单元b1。
可选的,在同一个信号通路单元组中,各个信号通路单元之间可以是无序的。在求解具有多个信号通路单元的信号通路单元组时,可以随机确定各个信号通路单元的求解顺序。这样,可以模拟生物学细胞的生化过程的并行特性。
在本公开实施方式中,将各个信号通路单元组(即各个信号通路单元)被执行一遍作为一次对数字细胞模型的仿真,将每次仿真后的结果作为下一次仿真模拟的基础。在每次仿真过程中,可以按照过程次序依次求解各个信号通路单元。这样,数字细胞模型尽管包括有大量的生化过程方程组,但是这些生化过程方程组并不是同时进行求解的,而是按照信号通路单元的过程次序依次求解。
在本公开的一种实施方式中,在所述数字细胞模型进行任意一次仿真时,按照单元过程次序执行各个信号通路单元。
在本公开的实施方式中,信号通路单元中的生化反应模块具有模块过程次序。在执行任意一个所述信号通路单元时,按照模块过程次序执行所述信号通路单元中的各个生化反应模块。
在执行任意一个生化反应模块时,可以求解该生化反应模块中的生化过程方程组,生化过程方程组引用当前的所述生化组分池中的生化组分的浓度;在完成任意一个所述生化过程方程组的求解后,根据求解结果更新所述生化组分池中的各个生化组分信息,尤其是更新各个生化组分的浓度。
在本公开一些实施方式中,在求解信号通路单元时,如果多个信号通路单元具有相同的单元过程次序,则这些信号通路单元可以整体上并行求解。具体的,在一种示例中,在每次仿真时,这些具有相同单元过程次序的信号通路单元之间的执行顺序是随机的;在大量迭代过程中,各个信号通路单元可以被视为是并行的。这样可以模拟同一信号通路单元组中各个信号通路单元之间的并行特性,提高数字细胞模型的鲁棒性。
在本公开一些实施方式中,至少一个信号通路单元包括多个信号通路子单元,任意一个所述信号通路子单元包括一个生化反应模块或者多个具有过程次序的生化反应模块。在求解具有多个信号通路子单元的信号通路单元时,可以随机确定所述多个信号通路子单元的求解顺序,并按照所述求解顺序依次求解所述多个信号通路子单元。
换言之,在逻辑层面上,在同一个信号通路单元内部,也可以存在一些并行的信号通路子单元;每个信号通路子单元可以模拟一部分信号通路,这部分信号通路可以包括一个生化过程,也可以包括依次级联的多个生化过程,每个生化过程均通过生化过程方程组进行模拟。当然的,在进一步的方案中,各个信号通路子单元还可以具有更下一级的子模块。
在本公开的实施方式中,各个生化过程共享生化组分池,即每个生化过程对应的生化过程方程组需要基于生化组分池进行求解并需要将求解结果反映至生化组分池中。具体的,在根据生化过程方程组进行计算时,需要使用从生化组分池中各个生化组分的浓度;而后,需要根据计算结果更新生化组分池中生化过程方程组所涉及到的生化组分的浓度;这样可以模拟出,生化过程对周围的生化组分的消耗、增加、依赖等。在本公开的一些实施方式中,在每完成一次生化过程方程组的求解后,可以用求解结果和更新前的生化组分池(求解该生化过程方程组之前的生化组分池),确定更新后的生化组分池。
这样,在数字细胞模型迭代仿真过程中,生化组分池根据每个生化过程方程组的求解结果进行更新,以模拟数字细胞模型在细胞活动过程中的生化组分的动态变化过程。在本公开的一些实施方式中,可以记录每次仿真的特定阶段的生化组分池的数据,例如每次仿真结束后的生化组分池的数据(即各个生化组分的浓度)。其中,在数字细胞模型迭代仿真过程中,这些数据可以作为生化组分池的历史数据,以用于对数字细胞模型的迭代仿真过程的评估。可以理解的是,生化组分池的历史数据,还包括各个数据是第几次仿真过程产生的。换言之,生化组分池的历史数据,可以包括数字细胞模型的仿真次数和该仿真次数对应的生化组分池数据。
参见图3,在数字细胞模型迭代仿真的过程中可以对数字细胞模型进行评估,以判断数字细胞模型在迭代仿真过程中是否达到稳态状态。例如,可以根据过程评估模型库对数字细胞模型的状态进行评估,判断数字细胞模型是否达到稳态状态或者达到失败状态,进而判断数字细胞模型的迭代仿真过程是否达到结束条件。在进一步的示 例中,如果数字细胞模型在迭代仿真过程中进入失败状态,则可以更新初始的数字细胞模型,并重新进行迭代仿真和评估;如此循环,直至数字细胞模型在迭代仿真过程中被评估为稳态状态。数字细胞模型在迭代仿真过程中被评估为稳态状态,表明当前的初始的数字细胞模型在迭代仿真时已经能够反应生物学细胞的一些细胞活动规律,这表明当前的初始的数字细胞模型中所限定的初始的生化组分池和各个信号通路单元组在一定程度上能够有效的模拟生物学细胞的生化组分分布和细胞活动过程规律。根据当前的初始的数字细胞模型所确定的目标数字细胞模型,能够较为有效的模拟生物学细胞的至少一部分细胞活动。
如此,按照本公开的数字细胞模型的构建方法,可以根据初始的数字细胞模型在迭代仿真过程中的评估结果,来判断初始的数字细胞模型的生化组分池和各个信号通路单元是否合适;当初始的数字细胞模型在迭代仿真过程中被评估为失败状态时,可以判断当前的初始的数字细胞模型对生化组分的种类、浓度、各个生化过程的模拟等方面存在不适宜之处,进而可以通过变更生化组分池和信号通路单元中的至少一个来重新获取所述初始的数字细胞模型,即通过更新生化组分池和信号通路单元中的至少一个来更新初始的数字细胞模型。如此循环,直至初始的数字细胞模型在迭代仿真过程中被评估为稳态状态。这样,本公开的数字细胞模型的构建方法,可以克服当前缺少准确的细胞生物学知识的缺陷,够通过更新初始的数字细胞模型-迭代仿真和过程评估这一循环过程确定更为合适的生化组分池和信号通路单元,获得能够达到稳态状态的数字细胞模型以达成对至少一部分细胞活动的有效模拟。
在本公开的一种实施方式中,在构建数字细胞模型时,所依据的生物化学信息包括信号通路信息、蛋白网络信息、基因网络信息、生物标志物信息、与生化过程相关的信息等信息中的至少一个。当然的,还可以依据其他更多的生物化学信息来构建数字细胞模型。进一步的,这些生物化学信息的至少部分是从公开文献中通过知识抽取而获得的。
在本公开的一些实施方式中,在迭代仿真的过程中,所述数字细胞模型达到失败状态时更新所述初始的数字细胞模型。更新所述初始的数字细胞模型时,可以更新初始的生化组分池和多个信号通路单元中的至少一个,以使得新的初始的数字细胞模型的生化组分池和多个信号通路单元在整体上与更新前的初始的数字细胞模型不同为准。
举例而言,在本公开的一种实施方式中,可以通过更新初始的生化组分池(初始的数字细胞模型在未进行任何迭代仿真之前的生化组分池)来更新初始的数字细胞模型,例如可以向更新前的初始的生化组分池增加一个或者多个新的生化组分信息,或者从更新前的初始的生化组分池中删除一个或者多个生化组分信息,或者变更更新前的初始的生化组分池中一种或者多个生化组分信息的生化组分的浓度,或者同时应用上述的添加新的生化组分信息、删除生化组分信息、变更生化组分的浓度等三种策略 中的两种或者三种。当然的,在本公开的其他实施方式中,还可以生成新的初始的生化组分池(即重新生成初始的生化组分池),并在判断新的初始的生化组分池与更新前的初始的生化组分池不同后,用新的初始的生化组分池来替代更新前的初始的生化组分池。
再举例而言,在本公开的另一种实施方式中,可以通过更新一个或者多个信号通路单元来更新初始的数字细胞模型。例如,可以采用调整信号通路单元所属的信号通路单元、向当前的所述信号通路单元中加入新的信号通路单元、从当前的所述信号通路单元中删除信号通路单元、更新当前的所述信号通路单元中的生化过程方程组的种类或者数量、更新当前的所述信号通路单元中任意生化过程方程组的常量的取值等策略中的至少一种。当然的,在本公开的其他实施方式中,还可以生成新的信号通路单元(即重新生成信号通路单元),并在判断新的信号通路单元与更新前的信号通路单元不同后,用新的信号通路单元来替代更新前的信号通路单元。
在本公开的一些实施方式中,每次更新初始的数字细胞模型可以单独采用更新初始的生化组分池的策略,也可以单独采用更新一个或者多个信号通路单元的策略,还可以同时采用更新初始的生化组分池的策略和更新一个或者多个信号通路单元的策略。任意两次重新获取所述初始的数字细胞模型时所采用的策略可以不同,也可以相同。
举例而言,在前一次更新初始的数字细胞模型时,可以单独采用者变更生化组分的浓度这一策略来更新初始的数字细胞模型;在后一次更新初始的数字细胞模型时,同时采用添加新的生化组分信息、删除生化组分信息、添加新的生化过程方程组、删除生化过程方程组等四种策略来更新初始的数字细胞模型。
可以理解的是,在一些情况下,不同策略之间具有关联性。示例性的,如果增加生化组分池中生化组分信息,则往往需要增加与新增的生化组分信息相关的生化过程方程组,这会导致至少一个信号通路单元的调整。再例如,如果删除生化组分池中部分生化组分信息,则往往需要删除与被删除的生化组分信息相关的生化过程方程组,这会导致至少一个信号通路单元的调整。
在本公开的一些实施方式中,可以按照预设的规则来确定每次更新初始的数字细胞模型时所采用的策略,例如连续多次采用同一策略,或者按照预设的策略序列进行循环,策略序列记录有一个策略循环周期内每次更新初始的数字细胞模型时所采用的策略。
在本公开的另一些实施方式中,可以根据数字细胞模型在迭代仿真过程中被评估为失败状态的原因或者在被评估为失败状态的具体状态等,来确定更新初始的数字细胞模型时所采用的策略,例如可以引入专家系统以提高更新初始的数字细胞模型的精准性,或者由技术专家介入以提供更恰当的更新策略。当然的,在本公开的其他实施方式中,还可以采用其他方式来确定每次更新初始的数字细胞模型时所采用的策略,以能够使得初始的数字细胞模型被更新为准。
在本公开的一种实施方式中,可以构建参数调整模块和结构调整模块,并借助参数调整模块和结构调整模块来调整当前初始的数字细胞模型以形成新的初始的数字细胞模型。具体的,可以通过参数调整模块对生化组分信息中的生化组分浓度、生化过程方程组的常量的取值进行调整;可以通过结构调整模块增加或者减少生化组分池中的生化组分信息、调整信号通路单元中所包括的生化过程方程组或者调整信号通路单元总所包括的信号通路单元。换言之,参数调整模块不会导致数字细胞模型中生化组分的种类和数量改变,也不会导致各个信号通路单元中的生化过程方程组中的种类、数量和所处的位置改变,仅仅是对生化组分的浓度或者生化过程方程组的参数的调整。这种调整方式,一般可以具有较小的调整幅度和较好的调整精度,利于对数字细胞模型进行微调。结构调整模块会对生化组分池中的生化组分的种类和数量、信号通路单元中生化过程方程组中的种类、数量和所处的位置等数字细胞模型的结构特征进行调整。通过这种调整方式,可以较大程度的改变数字细胞模型的架构,进而利于通过粗调的方式获得数字细胞模型可能的可行架构。
在本公开的一种实施方式中,可以先采用结构调整模块更新初始的数字细胞模型,直至获得具有良好前景的数字细胞模型的可行架构。然后,通过参数调整模块对结构调整后的初始的数字细胞模型再进行更新,以在该可行架构下获得能够对生化学细胞的部分功能和过程进行模拟的数字细胞模型。当然的,在本公开的其他实施方式中,也可以交替采用结构调整模块和结构调整模块更新初始的数字细胞模型。
可选地,在至少一次更新所述初始的数字细胞模型时,基于如下信息的至少一种对当前的所述初始的数字细胞模型进行更新:
来源于医学文献的信息、来源于生物学文献的信息、来源于高通量细胞实验的信息、来源于高通量测序的信息、基于文献信息、实验信息或者测序信息而预测的信息。
例如,可以通过搜索PubMed上关于某一生化过程的各文献,并综合各文献的信息来确定该生化过程所涉及的参数,尤其是生化过程方程组中的常量。换言之,本公开实施方式中,既可以利用公开文献获得构建数字细胞模型所需的各种信息,例如生化组分的参数和生化过程的参数、信号通路的信息等,也可以通过实验的方式获得所需的信息。通过高通量细胞实验或者高通量测序技术,可以在同一实验标准条件下获得所需的数据,一方面使得这些数据具有更高的可信度,另一方面可以克服不同文献中因采用的技术标准和技术手段不同而导致信息难以相互参考的缺陷。
在本公开的一种实施方式中,可以通过对公开文献的搜索和分析,确定出构建数字细胞模型所需的关键参数。对这些关键参数中的至少部分参数,例如容易通过高通量细胞实验或者高通量测序技术验证或者获得的参数,或者例如在不同的文献中具有较大差异的参数,通过实验进行验证或者通过实验自行获取,以保证这些关键参数的准确性和有效性,进而提高数字细胞模型的获取效率和提高数字细胞模型对生物学细 胞模拟的贴近程度。
在本公开的一些实施方式中,参见图3,数字细胞模型的构建方法还可以包括:根据生化组分数据库,获取所述初始的数字细胞模型的生化组分池(即初始的生化组分池);所述生化组分数据库包括多个生化组分设定信息,所述生化组分设定信息包括生化组分的浓度范围。
可选的,可以从所述生化组分数据库中选取多个生化组分设定信息,并根据被选取的生化组分设定信息确定生化组分信息,各个生化组分信息形成初始的生化组分池。具体的,生化组分设定信息包括一个生化组分的浓度范围,生化组分信息包括同一生化组分的浓度,生化组分信息中的浓度在生化组分设定信息的浓度范围内。这样,初始的生化组分池中涉及的生化组分的种类不超出生化组分数据库中生化组分的种类,初始的生化组分池中的生化组分的浓度在生化组分数据库中的生化组分的浓度范围以内。如此,生化组分数据库可以作为生成初始的生化组分池的依据和制约;当然的,也可以作为更新初始的生化组分池的依据和制约。
示例性的,生化组分数据库中包括N1(N1为正整数)个生化组分设定信息,即涉及N1个生化组分。在根据生化组分数据库生成初始的生化组分池时,可以从生化组分数据库中抽取N2(N2为正整数,且N2不大于N1)个生化组分设定信息并基于该N2个生化组分设定信息生成对应的N2个生化组分信息,各个生化组分信息的生化组分的浓度在对应的生化组分设定信息的生化组分的浓度范围内;该N2个生化组分信息可以组成初始的生化组分池。
可选的,在根据生化组分数据库获取所述初始的数字细胞模型的初始的生化组分池前,还可以先获取生化组分数据库。
在本公开的实施方式中,可以直接获取现有生化组分数据库,也可以在已有的数据库的基础上进行修改、补充而获取所需的生化组分数据库,还可以从头构建生化组分数据库。
在本公开的一种实施方式中,可以从头构建生化组分数据库,用于构建生化组分数据库的数据资料可以至少部分来源于已有的数据库、生物领域(例如细胞生物学领域)的公开文献(例如期刊、报刊、会议文集、论著等)、通过生物学实验和研究所获得的,在一些情况下还可以针对性的进行细胞生物学研究(例如进行高通量细胞实验)已获得特定的数据资料,这些数据资料可以包括生物细胞中的组分以及这些组分的浓度。当然的,可以理解的是,不同来源的资料数据,生物细胞中的组分及其浓度可能存在差异。
举例而言,生化组分数据库中可以从已有的数据库和/或生物领域的公开文献中查找出至少部分生化组分及其相关数据,并将这些数据经过校正或者不经校正的形成生化组分设定信息并添加至生化组分数据库中。
在一些情形下,根据数据库和/或生物领域的公开文献,如果多个数据来源均提 供了特定生化组分的浓度且不同数据来源的生化组分的浓度均较为接近,例如各个数据来源的生化组分的浓度均在同一数量级范围内波动,则可以认为该特定生化组分的浓度具有较高的确定性,进而将该生化组分的浓度范围不经校正地形成生化组分设定信息并添加至生化组分数据库中。
在另外一些情形下,根据数据库和/或生物领域的公开文献,如果多个数据来源均提供了特定生化组分的浓度且不同数据来源的生化组分的浓度相差较大,例如各个数据资料来源的生化组分的浓度的波动范围超过一个数量级,则可以认为该特定生化组分的浓度具有较大的不确定性,进而将该特定生化组分的浓度范围经校正后形成生化组分设定信息并添加至生化组分数据库中。例如删除明显偏离的数据或者对各个数据来源的生化组分的浓度进行统计分析后,确定出该特定的生化组分的浓度范围,基于确定出的浓度范围生成生化组分设定信息。
可以理解的是,对于不同的生化组分,需要采用不同的标准判断已有数据资料所公开的浓度是否具有较大的不确定性。示例性的,一些生化组分需要参与很多生化过程,或者因为其他原因,而导致其本身浓度波动不会太大(例如浓度波动很少波动超过一个数量级),此时需要采用较为严格的标准判断已有数据资料所公开的浓度是否具有较大的不确定性。在另外一些示例中,一些生化组分在生化过程中或者在细胞的不同状态下具有较大的浓度差异,此时需要采用较为宽松的标准判断已有数据资料所公开的浓度是否具有较大的不确定性。
在一些情形下,如果根据已有数据资料(例如已有的数据库和/或生物领域的公开文献)发现某一生化组分的浓度具有较大的不确定性(例如浓度差异很大)或者难以判断其确定性,可以考虑通过生物学实验(例如高通量细胞实验、高通量测序技术等)来确定该生化组分的浓度及其浓度范围,以提高生化组分数据库的准确性。当然的,并非每个浓度范围具有较大不确定性的生化组分均需经过生物学实验,即并不强求每个生化组分数据库中的每个生化组分的浓度范围均具有较大的确定性。
在一些情形下,如果在生物学实验或者研究中发现了新的生化组分,则可以将该新的生化组分的生化组分设定信息添加至生化组分数据库中;如此可以通过不断完善生化组分数据库的方式,为构建更为精准的数字细胞模型提供知识基础,使得所构建的数字细胞模型能够更有效的模拟生物细胞的生化过程。
在一些情形下,可以利用机器学习算法来预测部分生化组分的浓度,并基于预测结果形成生化组分设定信息并添加至生化组分数据库中。示例性的,在特定的生化组分的浓度难以直接测量时,可以根据已有资料构建与该生化组分相关的生化网络模型,然后利用机器学习算法预测该生化组分的浓度或者浓度范围。
参见图6所提供的示例,为了计算特定生化组分At的浓度,可以将生化组分At作为一个节点而构建生化网络模型,该生化网络模型还包括节点A1、节点A2、节点A3和节点A4;节点A1、节点A2、节点A3表示生成特定生化组分At的生化过程, 节点A4表示特定生化组分At参与的生化过程。通过获得节点A1、节点A2、节点A3和节点A4的数据,可以推测出特定生化组分At的浓度或者浓度范围。
在一些情形下,可以基于已知的生化网络,借助链路预测技术来预测新的生化组分和新的生化过程。这些预测的生化组分的浓度范围可以经过直接验证(例如生物学验证,尤其是通过高通量细胞实验验证)或者间接验证(例如通过不同的生化网络模型进行验证)后,形成生化组分设定信息并添加至生化组分数据库中,进而完善生化组分数据库,使得基于生化组分数据库构建的数字细胞模型能够更接近真实细胞。
在本公开的一种实施方式中,可以构建数据库解析器,数据库解析器可以对现有的数据资料进行解析以生成生化组分设定信息,例如根据已有的数据库生成生化组分设定信息或者根据公开文献生成生化组分设定信息。进一步的,通过数据库解析器构建的至少部分生化组分设定信息可以作为初始生化组分设定信息;该初始生化组分设定信息可以被认为修正或者通过其他方式进行修正后,作为最终可用的生化组分设定信息以用于构建或者更新初始的生化组分池。举例而言,数据库解析器可以对现有文献进行分析,获得一个特定的生化组分在不同文献中的浓度,并将这些浓度及文献的相关信息汇集作为初始的生化组分设定信息。技术专家可以对该初始的生化组分设定信息所涉及的浓度做修正,例如删除一些明显错误的浓度或者根据所汇集的浓度设定浓度范围,进而获得最终可用的生化组分设定信息。
在本公开的一些实施方式中,在更新初始的生化组分池时,根据所述生化组分数据库,更新当前的所述初始的数字细胞模型的初始的生化组分池。具体的,可以根据生化组分数据库来调整初始的生化组分池中的生化组分信息的种类、生化组分信息中生化组分的浓度,或者同时对初始的生化组分池中的生化组分信息的种类和至少一个生化组分信息的生化组分的浓度进行调整,进而获得新的初始的生化组分池。当然的,在本公开的其他实施方式中,也可以根据初始的生化组分池重新生成新的初始的生化组分池,并在新的初始的生化组分池与更新前的初始的生化组分池不同的情况下,用新的初始的生化组分池替换更新前的初始的生化组分池,进而完成对初始的生化组分池的更新。
在本公开的一种实施方式中,生化组分数据库中,生化组分设定信息还可以包括各个生化组分的浓度搜索步长。换言之,生化组分设定信息至少记录了各个生化组分的浓度范围以及浓度搜索步长。在根据生化组分数据库来更新初始的生化组分池中某一个特定生化组分的浓度时,可以根据初始的生化组分池中该特定生化组分的当前浓度和生化组分数据库中该特定生化组分的浓度范围、浓度搜索步长来确定该特定生化组分在初始的生化组分池中的新的浓度,进而更新该生化组分的生化组分信息。当然的,在本公开的另外一些实施方式中,至少一个生化组分设定信息的生化组分的浓度范围可以为多个离散的浓度值;在更新该生化组分在初始的生化组分池中的生化组分信息时,可以从该生化组分的生化组分设定信息的浓度范围中选择一个新的浓度值。
在本公开的一些实施方式中,至少一个生化组分设定信息的生化组分的浓度为具体的浓度(即点值),则该生化组分的浓度范围就是该具体的浓度。可选的,在构建生化组分数据库的过程中,一些生化组分的浓度变化对细胞的生化过程影响较小,或者其本身的浓度或者含量相对稳定,这些生化组分在生化组分数据库中的浓度范围可以被设定具体的浓度,即生化组分设定信息的浓度范围可以为具体的浓度,以在体现生物学细胞的部分生物学规律的同时并降低数字细胞模型的获取难度。
在本公开的一些实施方式中,生化组分数据库中各个生化组分设定信息还被标记有可信度参数(例如可信度等级),可信度参数用于表征生化组分的浓度可信度。在一种示例中,当生化组分设定信息具有较大的浓度范围时,该生化组分的浓度的可信度较低;当生化组分设定信息具有较小的浓度范围时,该生化组分的浓度的可信度较高。在另一种示例中,当生化组分设定信息的浓度范围来源于高可信的数据资料时,例如来自生物学实验数据时,该生化组分的浓度的可信度较高;当生化组分设定信息的浓度范围来源于较低可信的数据资料时,例如来自于非权威的期刊、报纸或者仅仅为预测值时,该生化组分的浓度的可信度较低。
在一种示例中,在根据生化组分数据库生成初始的生化组分池时,可以优先采用高可信度的生化组分设定信息,或者至少部分按照生化组分设定信息的可信度从生化组分数据库中抽取生化组分设定信息。
在一种示例中,在根据生化组分数据库更新初始的生化组分池时,可以优先调整低可信度的生化组分设定信息对应的生化组分信息,例如删除这些生化组分信息(与低可信度的生化组分设定信息对应的生化组分信息)中的至少部分或者变至少部分这些生化组分信息(与低可信度的生化组分设定信息)的生化组分的浓度。
在本公开的一些实施方式中,参见图3,数字细胞模型的构建方法还包括,根据信号通路数据库生成或者更新初始的数字细胞模型;具体的,根据信号通路数据库生成或者更新初始的数字细胞模型中的信号通路单元。
在本公开的一种实施方式中,可以采用如下方法,来根据信号通路数据库生成或者更新初始的数字细胞模型中的信号通路单元:
根据信号通路数据库生成初始信号通路信息;
根据初始信号通路信息和生化过程范式池,获取初始的数字细胞模型的各个信号通路单元。
其中,所述信号通路数据库包括多个信号通路信息和生化过程范式池;所述生化过程范式池包括多个用于描述生化过程规律的生化过程范式;所述信号通路信息包括用于描述信号通路的各个生化过程的生化过程信息,以及被标记有过程次序;每个所述生化过程信息包括生化过程所引用的生化过程范式、所引用的生化过程范式中的变量的含义、所引用的生化过程范式中的常量的取值范围、在信号通路信息中的次序;所述初始信号通路信息包括从所述信号通路数据库中获取的多个信号通路信息,且所 述初始信号通路信息中的生化过程信息所引用的生化过程范式中的常量被限定为点值。
可选的,在构建初始的数字细胞模型时,可以根据信号通路数据库获取初始的数字细胞模型中的各个信号通路单元。在通过更新信号通路单元来更新初始的数字细胞模型时,可以根据信号通路数据库更新初始的数字细胞模型中的信号通路单元。
可选的,可以根据已有的知识,例如生物文献或者医学文献中关于细胞内信号通路的知识,搭建出各个信号通路,即构建出各个信号通路信息;该信号通路信息中的各个生化过程信息通过引用生化过程范式池,可以转化成所需的信号通路单元。根据各个信号通路信息的过程次序,可以确定各个信号通路单元的组合方式,进而构建出各个信号通路单元。
在本公开的一种实施方式中,生化过程范式具有变量(包括自变量和因变量)和常量,以及对变量、常量之间数学关系的数学描述。在生化过程范式中,各个变量和常量没有被赋予特定的含义,常量也没有被赋值。换言之,生化过程范式仅用于表示数学规律。
在本公开的一种实施方式中,生化过程信息所引用的生化过程范式中的变量的含义,可以是指生化过程范式在该生化过程信息中,各个变量所引用的生化组分信息,尤其是所引用的生化组分信息中的生化组分的浓度。
这样,当生化过程信息调用所引用的生化过程范式时,可以根据生化过程范式生成一个或者多个生化过程数学模型,该生化过程数学模型能够模拟生化过程信息所定义的生化过程。具体的,可以根据将生化过程范式中各个变量替换为生化过程信息定义的各个生化组分的浓度,可以根据生化过程信息中对常量的取值范围的定义确定一个常数作为常量。本公开实施方式中,信号通路数据库中通过设置生化过程范式池和信号通路信息,一方面可以简化对各个生化过程数学模型的构建,降低信号通路数据库的复杂度;另一方面,所构建的信号通路数据库可以同时满足知识表达和计算表达的需求。尤其是,在一种示例中,生化过程信息中对变量的含义的描述以及生化过程信息的标签或者名称可以采用本领域惯长的缩写或者自定义的表述规则,进而使得生化过程信息更容易被构建且更容易被更新或者修改。
作为一种示例,一个生化过程范式的类型为“mm”,该生化过程范式包括如下两个方程:
activators.v_max=activators.k_cat*activators.concentration;
v_t=activators.v_max*substrate.concentrationmultimer/
(substrate.k_m+substrate.concentration)multimer
该生化过程范式可以体现一个激酶催化的多聚化过程,其中的activators、substrate等均不表示具体的生化组分,k_m、k_cat等均不指代具体的常量。图4示例了一个生化过程信息,以及该生化过程信息在调用“mm”所表示的生化过程范式时能 够生成的生化过程数学模型。在图4的示例中,生化过程数学模型描述了该激酶催化的多聚化过程的反应速率,具体体现在产物的浓度变化速度v_t。根据该激酶催化的多聚化过程的反应速率,结合时间步长,则可以确定所涉及的各个生化组分在一个时间步长后的浓度变化量。
在本公开的一些实施方式中,可以根据信号通路信息生成初始信号通路信息,初始信号通路信息包括多个从信号通路信息中抽取的生化过程信息,且生化过程信息中的常量在初始信号通路信息中为点值而非范围值;生化过程信息的常量在初始信号通路信息中的值满足在信号通路信息中的范围。如此,初始信号通路信息能够通过调用生化过程范式池而转变成多个具体的生化过程数学模型,这些初始信号通路信息被用于构建数字细胞模型的各个信号通路单元。当需要更新初始的数字细胞模型时,可以通过更新初始信号通路信息来更新数字细胞模型的信号通路单元。
在一种示例中,初始信号通路信息中的生化过程信息可以通过调用生化过程范式池中的生化过程范式直接生成生化过程方程组,并根据初始信号通路信息的过程次序确定生化过程方程组所属的信号通路单元。
示例性的,一个生化过程范式可以包括由多个范式方程组成的范式方程组,每个范式方程用于模拟一个变量在一个时间步长后的变化,例如模拟一个变量在一个时间步长后的增加量或者减少量。在一个范式方程组中,变量的增减情况可以反映出该变量所扮演的角色;一般的,当一个变量在一个时间步长后增大时,则变量为因变量,即在生化过程中表示反应产物的浓度;当一个变量在一个时间步长后减小时,则变量为自变量,即在生化过程中表示反应底物的浓度;当一个变量在一个时间步长后维持不变时,表示该变量代表酶等组分。
在本公开的实施方式中,在信号通路数据库中,生化过程信息还包括生化过程范式中的常量的取值范围,该取值范围可以为一个定值(即取值范围为点值),也可以为一个范围值。在一些示例中,当该生化过程信息中的常量的确定性很高或者非常明确时,其可以为点值;例如当该生化过程信息中的常量表示米氏常数时,其可以为点值。当该生化过程信息中的常量的确定性不是非常高时,其可以被配置为范围值。
可选的,基于医学文献、基于生物学文献、采用高通量细胞实验、采用机器学习算法、采用链路预测算法中的至少一种确定至少一种生化过程,根据所述生化过程生成生化过程信息并添加至所述信号通路数据库的信号通路信息中。如此,通过不断完善信号通路数据库,可以提高数字细胞模型的精度。
在本公开的一种示例中,可以通过高通量生物学实验技术(例如高通量细胞实验技术或者高通量测序技术)来确定部分生化过程中的常量,该生化过程中的常量可以作为确定生化过程信息中常量的取值范围的依据。
在本公开的另外一些示例中,可以通过机器学习算法来确定部分生化过程的常量,并根据该生化过程的常量来确定对应的生化过程信息的常量的取值范围。
参见图7所提供的示例,为了获得特定的生化过程的常量,可以将该特定的生化过程Bt作为一个节点而构建生化网络模型,该生化网络模型还包括节点B1、节点B2、节点B3和节点B4;节点B1、节点B2、节点B3表示参与生化过程Bt的生化组分信息,节点B4表示特定的生化过程Bt生成的生化组分信息。通过获得节点B1、节点B2、节点B3和节点B4的数据,例如通过高通量细胞实验获得节点B1、节点B2、节点B3和节点B4在不同浓度组合下的浓度数据,可以通过机器学习确定该生化过程Bt的生化过程数学模型,进而根据生化过程数学模型确定出该生化过程的常量。
在本公开的一种实施方式中,生化组分池中所包括的生化组分的种类、数量与生化组分数据库中的生化组分设定信息的种类和数量一致;或者,信号通路数据库中所涉及的生化过程信息或者信号通路信息均在数字细胞模型中有所体现。这种方式可以尽量提高数字细胞模型与生物学细胞的贴近程度。然后,可以理解的是,受到模拟手段的限制和对生物学细胞认知知识上的限制,数字细胞模型在一些特定的阶段可能难以完整的模拟生物学细胞的各个功能,例如可能会将实际上依次进行的多个生化过程视为一个生化过程进行模拟,以尽量模拟这些过程的最终结果而非各个生化过程的具体过程。因此,在本公开的其他实施方式中,数字细胞模型是基于生化组分数据库和信号通路数据库构建的,但是并不追求数字细胞模型完整、不遗漏的体现出生化组分数据库和信号通路数据库所收录的各种知识和信息,而是追求在一定的精度和功能上实现对生物学细胞的模拟。当然的,随着生化组分数据库和信号通路数据库中所收录的知识和信息越来越多,对细胞内信号网络的描述越来越精细,由生化组分数据库和信号通路数据库重新获得的新的版本的数字细胞模型对生物学细胞的模拟精度将越来越高。换言之,在本公开实施方式中,还可以不断完善生化组分数据库和信号通路数据库,并根据需求采用更完善的生化组分数据库和信号通路数据库构建出更完善的数字细胞模型。
在本公开的一种实施方式中,所获得的目标数字细胞模型,可以被保存为生化组分池和各个信号通路单元,以利于该目标数字细胞模型直接应用。在另一种实施方式中,目标数字细胞模型可以被保存为模型数据库,该模型数据库可以包括组分子数据库、过程子数据库和范式子数据库。分子数据库用于保存目标数字细胞模型的生化组分池中的各个生化组分信息。过程子数据库用于保存信号通路单元中的各个生化过程方程组的各个参数,即生化过程方程组被保存为生化过程信息的形式。范式子数据库用于保存各个生化过程方程组所涉及的各个生化过程范式,例如可以引用或者复制信号通路数据库的PM。如此,当对目标数字细胞模型进行修改以模拟新的细胞时,可以直接修改分子数据库中的生化组分信息和过程子数据库中的生化过程信息,然后根据修改后的分子数据库、修改后的过程子数据库和范式子数据库构建出新的数字细胞模型以模拟新的细胞。具体的,修改后的分子数据库可以用于构建新的数字细胞模型的生化组分池;修改后的过程子数据库通过引用范式子数据库,可以构建新的数字细 胞模型的各个信号通路单元。
在本公开的一些实施方式中,在第一次获取初始的数字细胞模型之前,可以先对信号通路单元进行调参,以获得各个目标信号通路单元。然后将具有关联性的目标信号通路单元作为整体进行调参,例如将在同一信号路径上的多个信号通路单元作为一个具有相关性的信号通路模型组进行调参,并获得目标信号通路模型组。然后,根据目标信号通路模型组获取初始的数字细胞模型。
在本公开的一种实施方式中,所述数字细胞模型的构建方法还包括获取多个目标信号通路单元;在获取初始的数字细胞模型时,根据所述多个目标信号通路单元确定所述初始的数字细胞模型中的信号通路单元。换言之,可以先对各个信号通路单元进行调参,然后再根据信号通路单元的调参结果在数字细胞模型整体层面进行调参。
获取任意一个目标信号通路单元包括:
构建起始的信号通路模型,所述起始的信号通路模型包括起始信号通路单元和起始信号通路单元所涉及的生化组分信息,生化组分信息包括生化组分的浓度;
确定信号通路模型的筛选条件,所述筛选条件包括一个或者多个作为标记物的生化组分的浓度变化趋势;
对起始的信号通路模型进行仿真模拟,并在仿真模拟过程根据筛选条件进行筛选;例如,判断仿真模拟过程中作为标记物的生化组分的浓度是否升高或者降低,以判断当前的信号通路模型所仿真出的标记物变化规律是否明显违背生物学细胞内规律。
若起始的信号通路模型的仿真模拟结果不满足筛选条件,则更新起始的信号通路模型直至满足筛选条件;
若起始的信号通路模型的仿真模拟结果满足筛选条件,则确定起始的信号通路模型为目标信号通路模型。
在本公开的一种实施方式中,所述数字细胞模型的构建方法还包括获取多个目标信号通路模型组;每个所述目标信号通路模型组包括多个信号通路单元;在获取初始的数字细胞模型时,根据所述目标信号通路模型组中的信号通路单元确定所述初始的数字细胞模型中的信号通路单元。换言之,可以先将多个具有关联性的(例如在信号传导上串联的)信号通路单元作为一个整体进行调参,并在调参后再在数字细胞模型层面进行整体调参。这样,可以在整体上提高数字细胞模型调参的效率,并使得数字细胞模型能够更有效的模拟生物学细胞。
可选的,确定任意一个目标信号通路模型组包括:
获取起始的信号通路模型组,所述起始的信号通路模型组包括多个目标信号通路单元和各个目标信号通路单元所涉及的生化组分信息,生化组分信息包括生化组分的浓度;
确定具有相关性的信号通路模型组的筛选条件,所述筛选条件包括一个或者多个作为标记物的生化组分的浓度变化趋势;
对起始的信号通路模型组进行仿真模拟,并在仿真模拟过程根据筛选条件进行筛选;例如,判断仿真模拟过程中作为标记物的生化组分的浓度是否升高或者降低;
若起始的信号通路模型组的仿真模拟结果不满足筛选条件,则更新起始的信号通路模型组直至满足筛选条件;更新起始的信号通路模型组包括更新具有相关性的信号通路模型组中的至少一个信号通路单元;
若起始的信号通路模型组的仿真模拟结果满足筛选条件,则确定起始的信号通路模型组为目标信号通路模型组。
在本公开的一些实施方式中,在步骤S110中,在获取初始的数字细胞模型后,还可以对初始的数字细胞模型的合法性进行判断;当初始的数字细胞模型满足合法性要求时,再进入步骤S120以迭代仿真数字细胞模型。
对初始的数字细胞模型的合法性判定可以采用白名单策略,也可以采用黑名单策略,本公开不做特殊的限制。其中,白名单策略是指,预先设定至少一条合法性规则;只要初始的数字细胞模型满足任意一条合法性规则则判定初始的数字细胞模型满足合法性要求,否则判定初始的数字细胞模型不满足合法性要求。黑名单策略是指,预先设定至少一条非法性规则;只要初始的数字细胞模型满足任意一条非法性规则则判定初始的数字细胞模型不满足合法性规则,否则判定初始的数字细胞模型满足合法性要求。
本公开提供的数字细胞模型的构建方法还可以包括,在步骤S110和步骤S120之间,在获取初始的数字细胞模型后,判断初始的数字细胞模型是否满足非法性规则库中的任意一条非法性规则;当初始的数字细胞模型满足任意一条非法性规则时,判定该初始的数字细胞模型不满足合法性条件,返回步骤S110以重新获取初始的数字细胞模型;当初始的数字细胞模型不满足任意一条非法性规则时,判定该初始的数字细胞模型满足合法性要求,并进入步骤S120。
作为一种示例,非法性规则可以包括但不限于如下规则:生化组分池中的至少一种生化组分信息未被任意一个生化过程方程组调用;至少一个生化过程方程组所调用的生化组分信息未包含在生化组分池中;生化组分池中至少一种生化组分的浓度不满足生化过程方程组对该生化组分的浓度的要求。
在本公开的一种实施方式中,本公开提供的数字细胞模型的构建方法还可以包括构建非法性规则库,所述非法性规则库中记录有一条或者多条非法性规则。
在步骤S130中,可以对数字细胞模型的迭代仿真过程进行评估,以判断述数字细胞模型在迭代仿真过程中的状态,并根据评估结果判断是否达到结束条件。在本公开的一些实施方式中,可以根据过程评估模型库对数字细胞模型的状态进行评估,判断数字细胞模型是否达到稳态状态或者达到失败状态,进而判断数字细胞模型的迭代仿真过程是否达到结束条件。
在本公开的一种实施方式中,在步骤S130中,在所述迭代仿真的过程中,判断所 述数字细胞模型是否达到稳态状态或者达到失败状态;在所述数字细胞模型达到失败状态时更新所述初始的数字细胞模型;在所述数字细胞模型达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型。
在一种示例中,本公开的数字细胞模型的构建方法还可以包括,获取过程评估模型库,例如构建过程评估模型库。
在本公开中,为了对数字细胞模型的过程或者状态进行评估,需要对作为标记物的生化组分的浓度进行监控或者数据处理。可以理解的是,为了达成不同的目的,在不同过程中的标记物可以是不同的。在各个过程中的作为标记物的生化组分,可以根据已有知识获得,例如通过检索生物学文献或者医学文献,确定哪些生化组分可以作为反映细胞凋亡的标记物,以及哪些生化指标可以作为反映细胞分裂的标记物。
在本公开的一种实施方式中,过程评估模型库包括第一过程评估模型,所述第一过程评估模型包括多个生化组分的合法浓度范围。
在步骤S120中,在数字细胞模型迭代仿真的过程中,利用第一过程评估模型对生化组分池进行评估。当评估发现生化组分池中的某一个或者某些作为标记物的生化组分的浓度超出了生化组分的合法浓度范围时,则判定该数字细胞模型呈失败状态。示例性的,第一过程评估模型限定了某一个特定生化组分的浓度不小于0;如果某一次更新后的生化组分池中,该特定生化组分的浓度为负值,则判定该数字细胞模型呈失败状态。
在一种示例中,可以采用第一过程评估模型对每一次更新后的生化组分池进行评估,以便在更新后的生化组分池的生化组分的浓度不满足合法浓度范围时,及时判定数字细胞模型呈失败状态而终止推进,及时更新初始的数字细胞模型而降低获取数字细胞模型的运算量,提高获取数字细胞模型的效率。
当然的,在本公开的其他示例中,也可以对各个更新的生化组分池进行抽样评估,例如生化组分池每更新3~10次进行一次评估。
可以理解的是,在第一过程评估模型中,生化组分的合法浓度范围可以与初始的生化组分池中生化组分的浓度不同,也可以与生化组分数据库中所限定的生化组分的浓度范围不同。初始的生化组分池和生化组分数据库中对生化组分的浓度的限定,是对初始的数字细胞模型的起始条件的限定,而非对数字细胞模型迭代仿真过程中的生化组分的浓度变化的限定。而在第一过程评估模型中,生化组分的合法浓度范围是对数字细胞模型迭代仿真过程中的生化组分的浓度变化的限定,以在生化组分的浓度出现明显不符合生物学知识的情况下(例如出现负值、例如出现极端高浓度)判断当前的数字细胞模型迭代仿真过程存在明显不符合生物学规律的情形,进而终止当前的数字细胞模型迭代仿真和放弃当前的初始的数字细胞模型。
在本公开的一种实施方式中,过程评估模型库包括第二过程评估模型,第二过程评估模型包括数字细胞模型迭代仿真的迭代次数上限。
在步骤S120中,可以根据第二过程评估模型来评估数字细胞模型的迭代仿真次数;当数字细胞模型的迭代仿真次数达到迭代次数上限时,则判定该数字细胞模型呈失败状态。示例性的,第二过程评估模型限定了数字细胞模型迭代仿真的迭代次数上限为30000次;当数字细胞模型迭代仿真次数达到30000次时,则判定该数字细胞模型呈失败状态。这样,如果一个初始的数字细胞模型在迭代仿真次数达到迭代次数上限时依然无法判断是否为稳态状态或者是否为失败状态,则及时终止迭代仿真以验证新的初始的数字细胞模型,提高获取目标数字细胞模型的效率。
在一种示例中,数字细胞模型每完成一次仿真,即每完成最后一个信号通路单元的求解后,可以采用第二过程评估模型对当前的迭代次数进行评估。当然的,在本公开的其他示例中,也可以在每次开始第一个信号通路单元的求解前进行评估,或者在其他时机进行评估。
在本公开的一种实施方式中,所述过程评估模型库包括第三过程评估模型,所述第三过程评估模型包括至少一个作为标记物的生化组分在迭代仿真过程中的浓度变化趋势,例如浓度逐渐升高、浓度逐渐降低、浓度高低波动、浓度在抬升后稳定在平台阶段等趋势中的一种或者多种。
根据过程评估模型库对数字细胞模型的状态进行评估包括:
在所述数字细胞模型迭代仿真的过程中,根据所述第三过程评估模型评估生化组分池的历史数据;若生化组分池的历史数据不满足第三过程评估模型,则所述数字细胞模型达到失败状态。
进一步的,第三过程评估模型还可以对作为标记物的生化组分的浓度变化过程是否存在突变(突然变化)进行检测,具体的,检测生化组分池的历史数据中选定的一个或者多个生化组分是否存在超过突变阈值的浓度突变,并将超过预设阈值的浓度突变作为不通过筛选评估的充分条件(达到失败状态)。这样,通过筛选的初始的数字细胞模型在迭代仿真过程中在整体上具有连续性。
在本公开的一种实施方式中,所述过程评估模型库包括第四过程评估模型,第四过程评估模型包括多个作为标记物的生化组分之间的浓度关系在迭代仿真过程中的变化趋势;例如某些标记物的浓度逐渐增高而有些标记物的浓度逐渐降低。
根据过程评估模型库对数字细胞模型的状态进行评估包括:
在所述数字细胞模型迭代仿真的过程中,根据所述第四过程评估模型评估生化组分池的历史数据;若生化组分池的历史数据不满足第四过程评估模型,则所述数字细胞模型达到失败状态。
在本公开的一种实施方式中,所述过程评估模型库包括第五过程评估模型,第五过程评估模型包括至少一个细胞类型模型,每个所述细胞类型模型包括与生化组分的浓度相关的细胞表型指数参考范围。
在本公开的一些实施方式中,参见图5,可以根据公开发表的文献来进行核心表 型的提取,具体的,获取这些核心表型对应的细胞表型参数。示例性的,这些细胞表型参数报道单不限于生存表型参数、增殖表型参数、凋亡表型参数、迁移表型参数、侵袭表型参数、克隆形成表型参数、自噬表型参数、血管生成表型参数、上皮细胞间质转化表型参数等。根据这些细胞表型参数的组合,进而可以获得用于表征细胞类型的细胞表型参数。
示例性,如果为了表征不同亚型的肿瘤细胞,则可以从公开发表的肿瘤方面的文献中,提取肿瘤的核心表型;然后确定各个肿瘤细胞的亚型对应的各个细胞表型参数的范围。
根据过程评估模型库对数字细胞模型的状态进行评估包括:
在所述数字细胞模型迭代仿真的过程中,根据所述第五过程评估模型评估数字细胞模型每次仿真后是否满足至少一个细胞类型模型;如果所述数字细胞模型在连续多次迭代仿真过程中满足同一个细胞类型模型,则所述数字细胞模型呈稳态状态。
进一步的,评估数字细胞模型任意一次仿真后是否满足至少一个细胞类型模型包括:根据仿真后的生化组分池,确定仿真后的细胞表型指数;根据所述细胞表型指数和各个细胞类型模型的细胞表型指数参考范围,确定所述细胞表型指数满足的细胞类型模型。
可选的,细胞表型指数可以包括多个细胞表型参数,任意一个所述细胞表型参数与作为标记物的生化组分的浓度相关(例如与浓度的增减相关)。相应的,任意一个所述细胞表型指数的参考范围包括多个细胞表型参数的参考范围。当根据生化组分池的历史数据或者当前数据所确定的各个细胞表型参数(整体上作为细胞表型指数),能够满足特定的细胞表型指数的各个细胞表型参数的参考范围时,则数字细胞模型呈现特定的细胞类型。
可选的,所述细胞表型参数包括生存表型参数、增殖表型参数、凋亡表型参数、迁移表型参数、侵袭表型参数、克隆形成表型参数、自噬表型参数、血管生成表型参数、上皮细胞间质转化表型参数中的多个。每个细胞表型参数所采用的标记物,可以根据医学或者生物学文献获得。可以理解的是,随着生物学知识的积累和对细胞的认知的加深,还可以构建新的细胞表型参数,以应用于本公开实施方式的细胞表型指数。
在本公开的一种实施方式中,第五过程评估模型可以仅包括一种细胞类型模型,该细胞类型模型具有目标细胞类型的细胞表型指数参考范围。当根据数字细胞模型的生化组分池确定的细胞表型指数满足该细胞表型指数参考范围时,该数字细胞模型可以呈现出目标细胞类型。当数字细胞模型在多次迭代中均呈现目标细胞类型时,例如连续多次迭代中的细胞表型指数基本相同且满足细胞表型指数参考范围时,该数字细胞模型达到稳态。
在一示例中,所述第五过程评估模型包括用于模拟正常细胞的细胞类型模型。如 此,如果数字细胞模型达到稳态时的生化组分池的细胞表型指数能够满足上述的细胞表型指数参考范围,则该数字细胞模型在达到稳态时呈现为正常细胞,具体为野生型(wildtype)正常的存活细胞。
在本公开的另一种实施方式中,所述第五过程评估模型包括多个多种不同的细胞类型模型,不同的细胞类型模型具有不同的细胞表型的细胞表型指数参考范围。当数字细胞模型能够在连续多次迭代中均满足任意一个同一细胞类型模型时,该数字细胞模型达到稳态。换言之,该评估过程是不设目标的。如此,通过细胞类型模型筛选的初始的数字细胞模型在迭代仿真过程中能够呈现出或者达到其中一种细胞类型。在后续应用中,可以根据初始的数字细胞模型所通过具体的细胞类型模型,来决定所获得的数字细胞模型的应用情景。
在本公开的再一种实施方式中,第五过程评估模型可以包括多种不同的细胞类型模型,不同的细胞类型模型具有不同的细胞表型的细胞表型指数参考范围;其中一种特定的细胞类型模型被标记为目标细胞类型模型。当通过第五过程评估模型对数字细胞模型的迭代仿真过程进行评估时,采用各个细胞类型模型分别对每次仿真后的生化组分池进行评估。当数字细胞模型在连续多次迭代仿真过程中满足目标细胞类型模型时,所述数字细胞模型呈稳态状态。当数字细胞模型在连续多次迭代仿真过程中满足目标细胞类型模型以外的同一个细胞类型模型时,对当前的初始的数字细胞模型添加细胞类型标签,并可以作为备用数字细胞模型而被保存。在后续应用中,可以根据备用数字细胞模型所能够达到的细胞类型,来决定所备用数字细胞模型的应用情景。这样,本公开的数字细胞模型的构建方法可以在获得目标数字细胞模型的同时,还可以获得适用于其他应用场景的备用数字细胞模型,或者可以加快适用于其他应用场景的数字细胞模型的构建速度。
在本公开的一些实施方式中,所述数字细胞模型在迭代仿真过程中被评估为稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型包括:
所述数字细胞模型在迭代仿真过程中达到稳态状态后,将当前的所述初始的数字细胞模型确定为目标数字细胞模型。
在本公开的另一些实施方式中,所述数字细胞模型在迭代仿真过程中达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型包括:
所述数字细胞模型在迭代仿真过程中达到稳态状态后,将当前的初始的数字细胞模型确定为候选数字细胞模型;
根据验证模型库对候选数字细胞模型进行验证评估;当候选数字细胞模型不满足验证模型库时,更新所述初始的数字细胞模型。如此,可以通过对候选数字细胞模型进行在验证的方式,判断候选数字细胞模型是否能够实现特定的功能,或者在模拟精度和准确性上是否满足要求,以使得该候选数字细胞模型具有更好的应用效果。
参见图3,在本公开的一些实施方式中,所述验证模型库包括突变干预模型,所 述突变干预模型包括至少一个突变干预子模型;任意一个突变干预子模型包括突变信息、外源信息和干预结果信息;所述突变信息包括至少一个组分突变信息和至少一个生化过程变化信息;所述组分突变信息用于描述细胞内的突变引起的生化组分池的变化;生化过程变化信息用于描述细胞内的突变所引起的生化过程的变化;所述外源信息包括至少一个外源组分信息和至少一个外源组分相关过程方程组;所述外源组分信息包括外源组分的浓度;所述外源组分相关过程方程组用于模拟外源组分的加入而导致的生化组分在一个时间步长后的浓度变化;所述干预结果信息包括分类标签和至少一个干预评估指标的参考范围,所述干预评估指标作为标记物的生化组分的浓度变化相关;
根据验证模型库对候选数字细胞模型进行验证评估包括,根据所述突变干预模型对所述候选数字细胞模型进行验证评估;根据所述突变干预模型对所述候选数字细胞模型进行验证评估包括:根据至少一个所述突变干预子模型对所述候选数字细胞模型进行验证评估;根据任意一个所述突变干预子模型对所述候选数字细胞模型进行验证评估包括:
将所述突变信息映射至所述候选数字细胞模型,以构建数字突变细胞模型;
所述数字突变细胞模型进行迭代仿真,获取达到稳态状态的数字突变细胞模型的生化组分池以作为突变稳态数据,以及将达到稳态状态的数字突变细胞模型作为稳态数字突变细胞模型;
根据所述稳态数字突变细胞模型和所述外源信息构建突变干预数字细胞模型;
所述突变干预数字细胞模型进行迭代仿真,获取达到稳态状态的突变干预数字细胞模型的生化组分池以作为突变干预数据;
获取候选数字细胞模型在迭代仿真达到稳态状态后的生化组分池以作为基础稳态数据;
根据所述基础稳态数据、所述突变稳态数据和所述突变干预数据,确定干预评估指标;
根据干预评估指标和干预评估指标的参考范围进行分类;在分类结果与分类标签一致时,候选数字细胞模型通过所述突变干预子模型的验证评估。
如此,该突变干预模型可以评估候选数字细胞模型构建数字突变细胞模型的能力,以及相应外部干预而回归正常细胞类型的能力。这种能力反映了一些疾病(例如肿瘤)的发病和治疗过程,即正常细胞突变成为作为突变细胞的肿瘤细胞,通过药物对肿瘤细胞进行干预进而实现对肿瘤的治疗。如果该候选数字细胞模型能够通过突变干预模型的评估,则该候选数字细胞模型可以通过患者个体化的肿瘤细胞构建个体化的数字突变细胞模型,并进行个体化的抗肿瘤药物评价,进而提高治疗效果和加速治疗进程。当然的,通过采用对应疾病的突变信息、外源信息和干预结果信息进行评估,则通过评估的候选数字细胞模型在其他疾病的治疗中也具有良好的应用前景。
在本公开实施方式中,药物是指具有对疾病具有治疗效果或者干预效果的化学组分或者化学物质组合。例如,针对不同的疾病,药物可以具有不同的分类。在抗肿瘤领域,药物可以包括但不限于靶向抗肿瘤药物、细胞毒性药物、细胞免疫调节剂、细胞表观遗传调节剂等。可以理解的是,在本公开的实施方式中,并非要求所有的药物都是经过行政审批的或者具有充分疗效的,一些具有称为药物潜力或者具有成药可能性的化学组分或者组合物,可以作为本公开实施方式中的药物。例如,如果以本公开实施方式所提供的数字细胞模型作为基础来开发针对特定疾病的药物,则所采用的药物可以是全新的而非经过临床验证或者行政审批的,例如可以采用一个具有潜在用途且未经验证的化合物,基于本公开实施方式提供的数字细胞模型进行特定的验证,来判断该化合物作为治疗该特定疾病方面可能的疗效,进而推进或者加速针对该疾病的治疗药物的开发或者筛查。
可选的,突变信息是通过实际的突变细胞构建的,例如是通过肿瘤细胞构建的。
在本公开的一种实施方式中,可以通过突变细胞的多组学数据,来构建突变信息。该一种示例中,突变信息可以为突变细胞的多组学数据库。
在本公开的一种实施方式中,所述数字细胞模型的构建方法还包括:
根据实际样本建立突变干预子模型,所述实际样本包括采用特定类型的突变细胞进行体外药敏实验所获得的数据或者根据临床过程所获得的数据;
其中,所述特定类型的突变细胞内的生化组分信息或者临床过程所涉及的突变细胞的生化组分信息,被用于确定所述突变信息中的组分突变信息;所述体外药敏实验所用到的药物信息或者所述临床过程所涉及的药物信息,被用于确定所述干预结果信息中的外源组分信息;所述体外药敏实验或者所述临床过程的结果,被用于确定所述干预结果信息的分类标签(例如有效标签或者无效标签)。
在一种示例中,所述特定类型的突变细胞为肿瘤细胞;所述药物信息,为靶向抗肿瘤药物的药物信息。
可选的,所述数字突变细胞模型进行迭代仿真过程中,当数字突变细胞模型在连续多次仿真中呈现相同的细胞类型时,所述数字突变细胞模型达到稳态状态,当前的所述数字突变细胞模型被作为稳态数字突变细胞模型,当前的所述数字突变细胞模型的生化组分池被作为突变稳态数据。
可选的,所述突变干预数字细胞模型进行迭代仿真过程中,当突变干预数字细胞模型在连续多次仿真中呈现相同的细胞类型时,所述突变干预数字细胞模型达到稳态状态,当前的所述突变干预数字细胞模型的生化组分池被作为突变干预数据。
可选的,根据所述稳态数字突变细胞模型和所述外源信息构建突变干预数字细胞模型包括:
将所述外源信息中的外源组分信息添加至稳态数字突变细胞模型的生化组分池,以形成突变干预数字细胞模型的生化组分池;将所述外源信息中的外源组分相关过程 方程组添加至稳态数字突变细胞模型的信号通路单元,以形成突变干预数字细胞模型的信号通路单元。
在本公开的一种实施方式中,外源信息可以为一个药物样本信息。换言之,可以从一个数字药物库和实际样本信息,构建出药物样本信息。在一种示例中,所述数字药物库包括不同药物的数据集;每个药物的数据集是根据药物作用靶点、获批适应症、安全剂量范围、代谢动力学参数、活性参数、副作用中的至少一种来构建的。可以根据实际样本中所用的药物及其浓度或者剂量,以及数字药物库中的药物的数据集,来形成一个药物样本信息以作为用于突变干预子模型的外源信息。
在本公开的一种实施方式中,所述干预评估指标包括表型逆转评分;
根据所述基础稳态数据、所述突变稳态数据和所述突变干预数据,确定干预评估指标包括:
根据所述基础稳态数据中作为标记物的生化组分的浓度,确定正常细胞表型指数;
根据所述突变稳态数据中作为标记物的生化组分的浓度,确定突变细胞表型指数;
根据所述突变干预数据中作为标记物的生化组分的浓度,确定干预后细胞表型指数;
根据突变细胞表型指数和正常细胞表型指数之间的差异,确定表型异常指数;
根据干预后细胞表型指数和突变细胞表型指数之间的差异,确定表型逆转指数;
根据表型逆转指数和所述表型异常指数,确定表型逆转评分;
在所述表型逆转评分不满足表型逆转评分的参考范围时,则候选数字细胞模型未通过所述突变干预子模型的验证评估。如此,通过突变干预子模型评估的候选数字细胞模型在模拟突变细胞和突变细胞对药物的响应时,具有较高的拟合程度。
参见图8,本公开实施方式还提供一种数字细胞模型的构建装置UA,包括:
构建模块UA1,被配置为基于生物化学信息构建初始的数字细胞模型;所述数字细胞模型包括生化组分池和多个信号通路单元;所述生化组分池包括多个生化组分信息,所述生化组分信息包括生化组分的浓度和/或位置信息;所述信号通路单元用于模拟生物细胞的信号通路;所述信号通路单元包括至少一个生化反应模块,所述生化反应模块用于利用生化过程方程组模拟所述信号通路单元中发生的生化过程;
仿真模块UA2,被配置为对初始的数字细胞模型进行迭代仿真,以模拟生物细胞中发生的生化过程;
判断模块UA3,被配置为在所述数字细胞模型达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型。
在本公开的一种实施方式中,判断模块UA3还被配置为,判断所述数字细胞模型是否达到稳态状态或者达到失败状态;在所述数字细胞模型达到失败状态时更新所述 初始的数字细胞模型;在所述数字细胞模型达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型。
进一步的,判断模块UA3还被配置为,根据过程评估模型库对数字细胞模型的状态进行评估,判断数字细胞模型是否达到稳态状态或者达到失败状态。
在本公开的一种实施方式中,数字细胞模型的构建装置UA还包括验证模块UA4,所述验证模块UA4被配置为,所述数字细胞模型在迭代仿真过程中达到稳态状态后,将当前的初始的数字细胞模型确定为候选数字细胞模型;根据验证模型库对候选数字细胞模型进行验证评估;当候选数字细胞模型不满足验证模型库时,更新所述初始的数字细胞模型。
参见图9,本公开实施方式还提供一种数字细胞系统,该数字细胞系统包括上述数字细胞模型的构建装置UA,还包括:
数字细胞引擎M1,用于运行数字细胞模型;
数据解析引擎M2,用于根据突变细胞的多组学数据构建突变细胞的多组学数据库;
数字药物库M3,用于提供药物信息;
数据映射引擎M4,用于将所述多组学数据库和/或所述数字药物库中的药物信息映射至所述数字细胞模型;
药效分析引擎M5,用于根据数字细胞引擎的运行结果预测药物疗效。
在一种示例中,该数字细胞系统的数字细胞模型的构建装置UA可以获取数字细胞模型,例如获取正常细胞的正常数字细胞模型,该数字细胞模型可以在数字细胞引擎M1中被运行。数据解析引擎M2可以接收突变细胞的多组学数据并构建突变细胞的多组学数据库。示例性的,数据解析引擎M2可以接收一个肿瘤患者的肿瘤细胞的多组学数据,并根据多组学数据构建该肿瘤患者的肿瘤细胞的个性化的多组学数据库。在数字细胞引擎M1将正常数字细胞模型运行至该数字细胞模型达到稳态后,数据映射引擎M4可以将数据解析引擎M2所构建的多组学数据库映射至该达到稳态状态的正常数字细胞模型,获得数字突变细胞模型;该数字突变细胞模型在数字细胞引擎M1中运行达到稳态状态后,数据映射引擎M4可以将数字药物库M3中的药物信息映射至达到稳态状态的数字突变细胞模型中,得到数字药物干预的突变细胞模型;数字药物干预的突变细胞模型在数字细胞引擎M1中运行达到稳态状态后,药效分析引擎M5根据数字细胞引擎M1的运行结果判断药物对突变细胞的治疗效果。
在一种示例中,数字细胞模型的构建装置UA在进行迭代仿真时,即数字细胞模型的构建装置UA的仿真模块,可以与数据解析引擎M2为同一模块。
在本公开的示例性实施例中,还提供了一种能够实现上述数字细胞模型的构建方法的电子设备。
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或 程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图10来描述根据本公开的这种实施方式的电子设备1000。图9显示的电子设备1000仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图10所示,电子设备1000以通用计算设备的形式表现。电子设备1000的组件可以包括但不限于:上述至少一个处理单元1010、上述至少一个存储单元1020、连接不同系统组件(包括存储单元1020和处理单元1010)的总线1030以及显示单元1040。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元1010执行,使得所述处理单元1010执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。
存储单元1020可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)10201和/或高速缓存存储单元10202,还可以进一步包括只读存储单元(ROM)10203。
存储单元1020还可以包括具有一组(至少一个)程序模块10205的程序/实用工具10204,这样的程序模块10205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线1030可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备1000也可以与一个或多个外部设备1100(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备1000交互的设备通信,和/或与使得该电子设备1000能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口1050进行。并且,电子设备1000还可以通过网络适配器1060与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器1060通过总线1030与电子设备1000的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备1000使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据 本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本公开的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。
根据本公开的实施方式的用于实现上述方法的程序产品,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外 部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
此外,上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由所附的权利要求指出。

Claims (18)

  1. 一种数字细胞模型的构建方法,包括:
    基于生物化学信息构建初始的数字细胞模型;所述数字细胞模型包括生化组分池和多个信号通路单元;所述生化组分池包括多个生化组分信息,所述生化组分信息包括生化组分的浓度和/或位置信息;所述信号通路单元用于模拟生物细胞的信号通路;所述信号通路单元包括至少一个生化反应模块,所述生化反应模块用于利用生化过程方程组模拟所述信号通路单元中发生的生化过程;
    对初始的数字细胞模型进行迭代仿真,以模拟生物细胞中发生的生化过程;
    在所述数字细胞模型达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型。
  2. 根据权利要求1所述的数字细胞模型的构建方法,其中,所述生物化学信息包括信号通路信息、蛋白网络信息、基因网络信息、生物标志物信息、与生化过程相关的信息中的至少一个。
  3. 根据权利要求1所述的数字细胞模型的构建方法,其中,所述信号通路单元具有单元过程次序,所述信号通路单元中的生化反应模块具有模块过程次序,对数字细胞模型进行迭代仿真包括:
    在所述数字细胞模型进行任意一次仿真时,按照单元过程次序执行各个信号通路单元;在执行任意一个所述信号通路单元时,按照模块过程次序执行所述信号通路单元中的各个生化反应模块。
  4. 根据权利要求1所述的数字细胞模型的构建方法,其中,所述信号通路单元包括信号通路子单元,任意一个所述信号通路子单元包括至少一个生化反应模块。
  5. 根据权利要求1所述的数字细胞模型的构建方法,其中,所述信号通路单元用于模拟细胞内信号传导、蛋白转运、细胞周期、细胞程序化死亡中的至少一个,或者用于模拟细胞内基因表达。
  6. 根据权利要求1所述的数字细胞模型的构建方法,其中,所述数字细胞模型的构建方法还包括:
    在迭代仿真的过程中,所述数字细胞模型达到失败状态时更新所述初始的数字细胞模型。
  7. 根据权利要求6所述的数字细胞模型的构建方法,其中,所述更新所述初始的数字细胞模型包括:
    更新所述初始的数字细胞模型的初始的生化组分池和信号通路单元中的至少一个,以获取新的初始的数字细胞模型。
  8. 根据权利要求6所述的数字细胞模型的构建方法,其中,所述更新所述初始的数字细胞模型包括:
    对生化组分信息中的生化组分浓度、生化过程方程组的参数进行调整;以及/或者
    增加或者减少生化组分池中的生化组分信息、调整信号通路单元中所包括的生化反应模块或者调整信号通路单元的单元过程次序。
  9. 根据权利要求1所述的数字细胞模型的构建方法,其中,基于生物学信息构建初始的数字细胞模型包括:
    根据生化组分数据库,获取所述初始的数字细胞模型的生化组分池;所述生化组分数据库包括多个生化组分设定信息,所述生化组分设定信息包括生化组分的浓度范围和/或位置信息。
  10. 根据权利要求9所述的数字细胞模型的构建方法,其中,所述数字细胞模型的构建方法还包括:
    基于医学文献、基于生物学文献、采用高通量细胞实验、采用高通量测序技术或者采用机器学习算法确定至少一个生化组分设定信息,并将确定的生化组分设定信息添加至所述生化组分数据库中。
  11. 根据权利要求9所述的数字细胞模型的构建方法,其中,所述生化组分设定信息包括生化组分的浓度范围和调节步长,或者包括多个离散的浓度值;
    所述数字细胞模型的构建方法还包括:
    在迭代仿真的过程中,所述数字细胞模型达到失败状态时更新所述初始的数字细胞模型;其中,更新所述初始的数字细胞模型包括,根据至少一个所述生化组分设定信息,更新所述初始的数字细胞模型的生化组分池中生化组分信息的浓度。
  12. 根据权利要求1所述的数字细胞模型的构建方法,其中,基于生物学信息构建初始的数字细胞模型包括:
    根据信号通路数据库生成初始信号通路信息;所述信号通路数据库包括多个信号通路信息和生化过程范式池;所述生化过程范式池包括多个用于描述生化过程规律的生化过程范式;
    根据所述初始信号通路信息和所述生化过程范式池,生成所述初始的数字细胞模型的信号通路单元。
  13. 根据权利要求12所述的数字细胞模型的构建方法,其中,所述生化过程范式池用于描述生化过程的浓度变化速度或者在一个时间步长内的浓度变化量。
  14. 根据权利要求12所述的数字细胞模型的构建方法,其中,所述数字细胞模型的构建方法还包括:
    基于医学文献、基于生物学文献、采用高通量细胞实验、采用机器学习算法、采用链路预测算法中的至少一种确定至少一种生化过程,根据所述生化过程生成生化过程信息并添加至所述信号通路数据库的信号通路信息中。
  15. 一种数字细胞模型的构建装置,包括:
    构建模块,被配置为基于生物化学信息构建初始的数字细胞模型;所述数字细胞模型包括生化组分池和多个信号通路单元;所述生化组分池包括多个生化组分信息, 所述生化组分信息包括生化组分的浓度和/或位置信息;所述信号通路单元用于模拟生物细胞的信号通路;所述信号通路单元包括至少一个生化反应模块,所述生化反应模块用于利用生化过程方程组模拟所述信号通路单元中发生的生化过程;
    仿真模块,被配置为对初始的数字细胞模型进行迭代仿真,以模拟生物细胞中发生的生化过程;
    判断模块,被配置为在所述数字细胞模型达到稳态状态后,根据当前的所述初始的数字细胞模型确定目标数字细胞模型。
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1~14任意一项所述的数字细胞模型的构建方法。
  17. 一种电子设备,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1~14任意一项所述的数字细胞模型的构建方法。
  18. 一种数字细胞系统,包括权利要求15所述的数字细胞模型的构建装置,还包括:
    数字细胞引擎,用于运行数字细胞模型;
    数据解析引擎,用于根据突变细胞的多组学数据构建突变细胞的多组学数据库;
    数字药物库,用于提供药物信息;
    数据映射引擎,用于将所述多组学数据库和/或所述数字药物库中的药物信息映射至所述数字细胞模型;
    药效分析引擎,用于根据数字细胞引擎的运行结果预测药物疗效。
PCT/CN2022/115811 2022-05-31 2022-08-30 数字细胞模型的构建方法及装置、介质、设备、系统 WO2023231202A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210616258.9 2022-05-31
CN202210613650.8 2022-05-31
CN202210616258.9A CN117198381A (zh) 2022-05-31 2022-05-31 数字细胞模型的构建方法及装置、介质、设备、系统
CN202210613650.8A CN117198380A (zh) 2022-05-31 2022-05-31 数字细胞模型的构建方法及装置、介质、设备、系统

Publications (2)

Publication Number Publication Date
WO2023231202A1 WO2023231202A1 (zh) 2023-12-07
WO2023231202A9 true WO2023231202A9 (zh) 2024-02-08

Family

ID=89026750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115811 WO2023231202A1 (zh) 2022-05-31 2022-08-30 数字细胞模型的构建方法及装置、介质、设备、系统

Country Status (1)

Country Link
WO (1) WO2023231202A1 (zh)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001069239A1 (en) * 2000-03-10 2001-09-20 New World Science And Technology, Inc. System and method for simulating cellular biochemical pathways
US20090070087A1 (en) * 2007-09-07 2009-03-12 Newman Richard D Virtual tissue with emergent behavior and modeling method for producing the tissue
GB201012297D0 (en) * 2010-07-22 2010-09-08 Ge Healthcare Uk Ltd A system and method for automated biological cell assay data analysis
EP3564958A1 (en) * 2013-05-28 2019-11-06 Five3 Genomics, LLC Paradigm drug response networks
CN109935281B (zh) * 2018-12-28 2023-03-14 浙大城市学院 一种解析大黄酸对肾间质纤维化疗效的定量网络药理学模型构建方法

Also Published As

Publication number Publication date
WO2023231202A1 (zh) 2023-12-07

Similar Documents

Publication Publication Date Title
Türei et al. Integrated intra‐and intercellular signaling knowledge for multicellular omics analysis
Frishberg et al. Cell composition analysis of bulk genomics using single-cell data
Wang et al. Boolean modeling in systems biology: an overview of methodology and applications
Saez‐Rodriguez et al. Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction
Abou-Jaoudé et al. Logical modeling and dynamical analysis of cellular networks
Hung et al. Gene set enrichment analysis: performance evaluation and usage guidelines
US8068994B2 (en) Method for analyzing biological networks
Pritykin et al. Simple topological features reflect dynamics and modularity in protein interaction networks
Kirouac et al. Using network biology to bridge pharmacokinetics and pharmacodynamics in oncology
Cui et al. LncRNA-disease associations prediction using bipartite local model with nearest profile-based association inferring
CN115798602A (zh) 基因调控网络构建方法、装置、设备及存储介质
Porreca et al. Structural identification of piecewise-linear models of genetic regulatory networks
Badkas et al. Construction and contextualization approaches for protein-protein interaction networks
WO2023231202A9 (zh) 数字细胞模型的构建方法及装置、介质、设备、系统
WO2023231203A9 (zh) 基于数字细胞模型的药物疗效预测方法及装置、介质、设备
Yosef et al. A complex-centric view of protein network evolution
Liu et al. Identification of essential proteins based on edge features and the fusion of multiple-source biological information
Gómez-Vela et al. Structure optimization for large gene networks based on greedy strategy
Espinosa et al. A gene-phenotype network for the laboratory mouse and its implications for systematic phenotyping
Ma’ayan et al. The cognitive phenotype of Down syndrome: insights from intracellular network analysis
CN117198381A (zh) 数字细胞模型的构建方法及装置、介质、设备、系统
CN117198380A (zh) 数字细胞模型的构建方法及装置、介质、设备、系统
Markarian et al. Enrichment on steps, not genes, improves inference of differentially expressed pathways
CN117198388A (zh) 数字突变细胞模型的构建方法及装置、介质、设备、系统
CN117198387A (zh) 药物疗效预测方法及装置、介质、设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22944513

Country of ref document: EP

Kind code of ref document: A1