WO2023081000A1 - Metabolic modeling for disease prognosis and treatment - Google Patents

Metabolic modeling for disease prognosis and treatment Download PDF

Info

Publication number
WO2023081000A1
WO2023081000A1 PCT/US2022/046520 US2022046520W WO2023081000A1 WO 2023081000 A1 WO2023081000 A1 WO 2023081000A1 US 2022046520 W US2022046520 W US 2022046520W WO 2023081000 A1 WO2023081000 A1 WO 2023081000A1
Authority
WO
WIPO (PCT)
Prior art keywords
computer
implemented method
metabolic
flux
reaction
Prior art date
Application number
PCT/US2022/046520
Other languages
French (fr)
Inventor
Daniel John COOK
Katherine Isabelle RITCHIE
John Andrew COLE
Original Assignee
Simbiosys, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Simbiosys, Inc. filed Critical Simbiosys, Inc.
Publication of WO2023081000A1 publication Critical patent/WO2023081000A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Definitions

  • Memory 104 may store program instructions and/or data on which program: instructions may operate.
  • memory 104 may store these program instructions on a lion-transitory, computer-readable medium, such that the instructions are executable by processor 102 to carry out any of the methods, processes, or operations disclosed in this specification, or the accompanying drawings.
  • a metabolic network can be characterized as a graph in which, reactions are edges and. the metabolites are nodes.
  • HMR2 Human Metabolic Reconstruction 2.0
  • Boolean expressions can be used to represent the activation and inhibition of reactions by combinations of protein complexes, isozymes, and parallel enzymes.
  • step A metabolic network 400 is provided.
  • This metabolic network represents the metabolic processing of a nutrient N through a number of reactions and intermediate metabolite compounds.
  • Metabolic network 400 includes branches, merges, and cycles.
  • Step C of Figure 4 depicts solution procedure 404.
  • this involves constraining to 0, constraining the fluxes to be within some values and v max , and then optimizing for an objective function in order to mimic biological objective of a cell.
  • Objective functions are known to include, but are not limited to, maximizing processes such cellular growth, nicotinamide adenine dinucleotide (NADH) production, ATP production, reaction flux correlation with gene expression (or gene expression scaled by an additional factor, for example enzymatic activity, translation rate, and/or protein degradation rate), or one or more cell byproducts.
  • Biological objectives an also be minimizing in nature, for example parsimonious total flux or total flux through the import/ export reactions.
  • module 500 generates one or more personalized GEMs 510.
  • the average saturation factor of the proteins is between 0.0 and 1.0. In some embodiments, the average saturation factor of the proteins is between 0.4 arid 0.6.
  • the enzymatic turnover number is based, on a maximum value across physiological, conditions.
  • the stoichiometric matrix remains numerically stable, and internal metabolic constraints are accounted for by the flux vector.

Abstract

An implementation involves obtaining a stoichiometric matrix representing a metabolic network of a tissue; obtaining omics data for an individual; constructing a rate of change vector representing steady state concentrations of compounds in the tissue due to reactions within the metabolic network; solving, by way of linear programming and in accordance with an objective function, a flux vector for the reactions, wherein the rate of change vector is a product of the stoichiometric matrix and the flux vector, wherein elements of the flux vector arc constrained to be within respective minimum and maximum values, and. wherein for a specific element of the flux vector, at least one of its minimum value or its maximum value is based on the omics data from the individual and an enzymatic turnover number for a reaction relating to the specific element; and generating, based on the flux vector, personalized metabolic model for the individual.

Description

Metabolic Modeling for Disease Prognosis and Treatment
CROSS-REFERENCE TO RELATED APPLICATION
[001] This application claims priority to U.S. provisional patent application no. 63/275,085, filed November 3, 2021 , which is hereby incorporated by reference in. its entirety.
BACKGROUND
[002] Certain diseases, the various types of cancer being examples, have proven challenging- to effectively treat in many individuals. This is due, at least in part, to the diseases often having a strong metabolic component. Cancer is characterized, in-part, by enhanced metabolic pathway activity allowing for rapid generation of ATP, thereby imparting a selective advantage for growth, survival, proliferation, and long-term survival of the cancerous cell. A genome scale metabolic model provides in silico techniques for studying cancer. The model allows for simulation and hypothesis testing of cancer by combining generic human metabolism models with high throughput expression data. As metabolic systems can be exceptionally complex and vary between individuals, they are frequently not very well understood and difficult to model.. Further, existing metabolic models are limited and, in many cases, too mathematically unstable to provide useful results. As such, more stable and. accurate models of tumor and tissue-specific metabolic models are needed.
SUMMARY
[003] Diagnosis, prognosis, and treatment of diseases can be improved by way of one or more of the embodiments herein. In particular, metabol ic models may be modified based on simplifying and/or making practical assumptions in order to improve their stability. These models can also be tailored to the physiology of specific individuals based on omics data collected from those individuals. The resulting individualized metabolic models can be used as the basis of simulations that attempt to characterize individualized disease behavior.
[004] Such simulations facilitate the identification of metabolic features that vary between individuals . Subsets of these features that are predictive of (or at least correlated, with) better or worse prognoses (e.g., rates of survival or response to therapy) are further identified.. These subsets of features can then be the basis for determining improved methods of diagnosis and treatment of diseases on an individualized basis. [005] Accordingly; a first example embodiment may involve obtaining a stoichiometric matrix representing a metabolic network of a tissue. The first example embodiment may also involve obtaining omics data for an individual. The first example embodiment may also involve constructing a rate of change vector representing steady state concentrations of compounds in the tissue due to reactions within the metabolic network. The first example embodiment may also involve solving, by way of linear programming and in accordance with an objective function, a flux vector for the reactions, wherein the rate of change vector is a product of the stoichiometric matrix and the flux vector, wherein elements of the flux vector are constrained to be within respective minimum and maximum values, and wherein for a specific element of the flux vector, at least one of its minimum value or its maximum value is based on the omics data from the individual and an enzymatic turnover number (or other kinetic parameter) for a reaction relating to the specific element. The first example embodiment may also involve generating, based on the flux vector, a personalized metabolic model for the individual,
[006] A second example embodiment may involve obtaining a plurality of personalized, metabolic models for a metabolic network of a tissue, wherein each of the personalized metabolic models has been tailored by omics data from respective individuals. The second example embodiment may also involve determining, through simulation of the personalized metabolic models, metabolic characteristics of the respective individuals. The second example embodiment may also involve identifying, by way of a statistical technique or an expression signature, concentrations of compounds at various stages of the personalized metabolic models that characterize variability observed between the metabolic characteristics of the respective individuals. The second example embodiment may also involve identifying, by way of a survivability analysis technique, chemical reactions that are most likely to be prognostic of survival or chemotherapeutic efficacy for one or more of the respective individuals.
[007] In a third example embodiment, an article of manufacture may include a non- transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing system, cause the computing system to perform operations in accordance with the first and/or second example embodiment.
[008] In a fourth example embodiment, a computing system may include at least one processor, as well as memory and program instructions. The program instructions may be stored in the memory, and upon execution by the at least one processor, cause the computing system to perform operations in accordance with the first and/or second example embodiment.
[009] In a fifth example embodiment, a system may include various means for carrying out each of the operalions of the first and/or second example embodiment.
[010] These, as well as other embodiments, aspects, advantages, and alternatives, will become apparent to those of ordinary skill in the art by reading the following detailed description, with reference where appropriate to the accompanying drawings. Further, this summary and other descriptions and figures provided herein are intended to illustrate embodiments by way of example only and, as such, that numerous variations are possible. For instance, structural elements and process steps can be rearranged, combined, distributed, eliminated, or otherwise changed, while remaining within the scope of the embodiments as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[011] Figure 1 illustrates a schematic drawing of a computing device, in accordance with example embodiments.
[012] Figure 2 illustrates a schematic drawing, of a server device cluster, in accordance with example embodiments.
[013] Figure 3 depicts a simplified metabolic network, in accordance with example embodiments.
[014] Figure 4 depicts a process tor solving a set of equations modeling a metabolic network with respect to an objective function, in accordance with example embodiments.
[015] Figure 5 depicts generating personalized genome-scale metabolic models, in accordance with example embodiments.
[016] Figure 6 is a flow chart, in accordance with example embodiments.
[017] Figure 7 depicts identifying prognostic biomarkers, in accordance with example embodiments.
[018] Figure 8A is an example of metabolic state identifiers of tumors that are prognostic of survival in bladder cancer, in accordance with example embodiments.
[019] Figure 8.B is an example of metabolic state identifiers of tumors that are prognostic of survival in metastatic melanoma, in accordance with example embodiments.
[020] Figure 9 is a flow chart, in accordance with example embodiments. DETAILED DESCRIPTION
[021] Example methods, devices, and systems are described herein. It should be understood that the words “example” and. “exemplary” are used, herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features unless stated as such. Thus, other embodiments can be utilized and other changes can be made without departing from the scope of the subject matter presented herein.
[022] Accordingly;, the example embodiments described herein are not meant to be limiting. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations. For example, the separation of features into “client” and “server” components may occur in a number of ways.
[023] Further, unless context suggests otherwise, the features Illustrated in each of the figures may be used, in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment
[024] Additionally, any enumeration of elements, blocks, or steps in this speci fication or the claims is for purposes of clarity. Thus, such enumeration should not be interpreted to require or imply that these elements, blocks, or steps adhere to a particular arrangement or are earned out in a particular order.
[025] The examples described, below are generally related to human metabolic systems and their influence on the progression of various types of cancers. Nonetheless, the approach described could be used for non-human subjects and/or for diseases or conditions other than cancer.
[026] Further, the terms “individual”, “patient”, and “subject” may be used interchangeably herein unless context, suggests otherwise.
I. Example Computing Devices and Cloud-Based Computing Environments
[027] Figure 1 is a simplified block diagram exemplifying a computing device 100, illustrating some of the components that could, be included in a computing device arranged to operate in accordance with the embodiments herein. Computing device 100 could be a client device (c,g„, a device actively operated by a user), a server device (e,g., a device that provides computational services to client devices), or some other type of computational platform. Some server devices may operate as client devices from time to time in order to perform particular operations, and some client devices may incorporate server features.
[028] In this example, computing device 100 includes processor .102, memory 104, network interface 106, and. input / output. unit 108, all of which may be coupled by sy stem bus 110 or a similar mechanism. In some embodiments, computing device 100 may include other components and/or peripheral devices (e.g., detachable storage, printers, and so on).
[029] Processor 102 may be one or more of any type of computer processing clement, such as a central processing unit (CPU), a co-processor (e.g., a mathematics, graphics, or encryption co-processor), a digital signal processor (DSP), a network processor, and/or a form of integrated circuit or controller that performs processor operations. In some cases, processor 102 may be one or more single-core processors. In other cases, processor 102 may be one or more multi-core processors with multiple independent, processing units. Processor 102 may also include register memory for temporarily storing instructions being executed and related data, as well as cache memory for temporarily storing recently-used instructions and data.
[030] .Memory 104 may be any form of computer-usable memory, including but not limited to random access memory (RAM), read-only memory (ROM), and non-volatile memory (e.g., flash memory, hard disk drives, solid state drives, compact discs (CDs), digital video discs (DVDs), and/or tape storage). Thus, memory 104 represents both main memory units, as well as long-term storage. Other types of memory may incl ude biological memory.
[031] Memory 104 may store program instructions and/or data on which program: instructions may operate. By way of example, memory 104 may store these program instructions on a lion-transitory, computer-readable medium, such that the instructions are executable by processor 102 to carry out any of the methods, processes, or operations disclosed in this specification, or the accompanying drawings.
[032] As shown in Figure 1, memory 104 may include firmware 104A, kernel 104B, and/or applications 104C. Firmware 104A may' be program code used to boot, or otherwise initiate some or all of computing device 100, Kernel 104B may be an operating system, includi ng modules for memory management, scheduling, and management of processes, i nput / output, and communication. Kernel 104B may also include device drivers that allow the operating system to communicate with the hardware modules (e.g., memory units, networking interfaces, ports, and. buses) of computing device 100. Applications 104C may be one or more user-space software programs, such as web browsers or email clients, as well as any software libraries used by these programs. Memory 104 may also store data used by these and other programs and applications.
[033] Network interface 106 may take the form of one or more wireline interfaces, such as Ethernet, (e.g., Fast Ethernet, Gigabit Ethernet, and so on). Network interface 106 may also support communication over one or more non-Ethernet media, such as coaxial cables or power lines, or over wide-area media, such as Synchronous Optical Networking (SONET) or digital subscriber line (DSL) technologies. Network interface 106 may additionally take the form of one or more wireless interfaces, such as IEEE 802.11 (Wifi), BLUETOOTHS, global positioning system (GPS), or a wide-area wireless interface. However, other forms of physical layer interfaces and other types of standard or proprietary communication protocols may be used over network interface .106. Furthermore, network interface 106 may comprise multiple physical interfaces. For instance, some embodiments of computing device 100 may include Ethernet, BLUETOOTHS, and Wifi interfaces,
[034] Input / output unit 108 may facilitate user and peripheral device interaction with computing device 100. Input / output unit 108 may include one or more types of input devices, such as a keyboard, a mouse, a touch screen, and so on. Similarly, Input / output unit 108 may include one or more types of output devices, such as a screen, monitor, printer, and/or one or more light emitting diodes (LEDs). Additionally, or alternatively, computing device 100 may communicate with other devices using a universal serial, bus (USB) or high-definition multimedia interface (HDMI) port interface, for example.
[035] In some embodiments, one or more computing devices like computing device 100 may be deployed to facilitate computation and use of metabolic models. The exact physical location, connectivity, and configuration of these computing devices may be unknown, and/or unimportant to client devices. Accordingly, the computing devices may be referred to as “cloud-based” devices that may be housed at various remote data center locations.
[036] Figure 2. depicts a cloud-based server cluster 200 in accordance with example embodiments. In Figure 2, operations of a computing device (e.g., computing device 100) may be distributed between server devices 202, data storage 204, and routers 206, all of which may be connected by local cluster network 208. The number of server devices 202, data storages 204, and routers 206 in server cluster 200 may depend on the computing task(s) and/or applications assigned to server cluster 200. [037] For example, server devices 202 can be configured to perform various computing tasks of computing device 100. Thus, computing tasks can be distributed among one or more of server devices 202. To the extent that these computing tasks can be performed in parallel such a distribution of tasks may reduce the total time to complete these tasks and return a result. For purposes of simplicity, both server cluster 200 and individual server devices 202 may be referred to as a “server device.” This nomenclature should be understood to imply that one or more distinct server devices, data storage devices, and cluster routers may be involved in server device operations.
[038] Data storage 204 may be data storage arrays that include drive array controllers configured, to manage read and. write access to groups of hard disk drives and/or solid state drives. The drive array controllers, alone or in conjunction with server devices 202, may also be configured to manage backup or redundant copies of the data stored in data storage 204 to protect against drive failures or other types of failures that prevent one or more of server devices 202 from accessing units of data storage 204. Other types of memory aside from drives may be used.
[039] Routers 206 may include networking equipment configured to provide internal and external communications for server cluster 200. For example, routers 206 may include one or more packet-switching and/or routing devices (including switches and/or gateways) configured to provide (i) network communications between server devices 202 and data storage 204 via local cluster network 208, and/or (ii) network communications between server cluster 200 and. other devices via communication link 210 to network 212.
[040] Additionally, the configuration of routers 206 can be based at least in pari on the data communication requirements of server devices 202 and data storage 204, the latency and throughput of the local cluster network 208, the latency, throughput, and cost of communication link 210,. and/or other factors that may contribute to the cost, speed, fault- tolerance. resiliency, efficiency, and/or other design goals of the system architecture.
[041] As a possible example, data storage 204 may include any form of database, such as a structured query language (SQL) database. Various types of data structures may store the information in such a database, including but not limited to tables, arrays, lists, trees, and tuples. Furthermore, any databases in data storage 204 may be monolithic or distributed across multiple physical devices. [042] Server devices 202 may be configured to transmit data to and receive data from data storage 204. This transmission and retrieval may take the form of SQL queries or other types of database queries, and the output of such queries, respectively. Additional text, images, video, and/or audio may be included as well. Furthermore, server devices 202 may organize the received data into web page or web application representations. Such a representation may take the form of a markup language, such as the hypertext markup language (HTML), the extensible markup language (XML), or some other standardized or proprietary format. Moreover, server devices 202 may have the capability of executing various types of computerized scripting languages, such as but not limited, to Perl, Python, PHP Hypertext Preprocessor (PHP), Active Server Pages (ASP), JAVASCRIPT®, and so on. Computer program code written in these languages may facilitate the providing of web pages to client devices, as well as client, device interaction with the web pages. Alternatively, or additionally, JAVA® may be used to facilitate generation of web pages and/or to provide web application functionality.
II. Improved Models of Metabolic Systems
[043] Metabolic systems in complex organisms are composed of a large number of metabolic networks. Each metabolic network describes how a cell or a tissue (e.g., a tumor) con verts some form of food into cellular building blocks (e.g., proteins, lipids, nucleic acids, and. carbohydrates), as well as heat and waste products that are eventually discharged. The metabolic systems comprise chemical reactions as well as associated regulatory steps.
[044] The chemical reactions that drive these conversions allow organisms to grow, maintain their structures, reproduce, and. respond, to their environments. In these reactions, compounds are tran.sfor.rned into other compounds in a series of steps, with each step catalyzed by a specific enzyme. Since enzymes are proteins, each enzyme is produced based on nucleotide sequences (genes) in the organism’s genetic code. Thus, metabolism has a genetic basis.
A, Metabolic Networks
[045] As an example, Figure 3 depicts a simplified metabolic network for a single cell. Glucose is introduced into cell 300 as food (i.e., a substrate). Glucose is converted by enzymes of the metabolic network into metabolites, small molecules that are intermediate or end products of a metabolism. For instance, the glucose is converted by enzyme El into metabolite MI. Likewise, metabolite Ml is converted into metabolite M2 by enzyme E2 and. into metabolite M3 by enzyme E3. This process of conversion through the metabolic network continues until metabolites M5 and M6 are discharged from Cell 300 as metabolic waste. While not shown in Figure 3, some metabolic networks may include cycles or could be modeled as closed systems. Also not shown in Figure 3 are certain metabolites being consumed by cell 300 or otherwise used for energy, growth, or maintenance of structure.
[046] Constructing accurate representations of metabolic systems from metabolic networks can be challenging due to the sheer volume of constituent enzymatic reactions and metabolites. As an example, the metabolic systems of certain types of yeast may include over 3900 reactions, over 2600 metabolites, and enzymes that are controlled by over 1100 genes. Nonetheless, such efforts have been made for the human metabolism and those of other organisms as well. They are referred to as genome-scale metabolic models (OEMs). GEMs serve as a mathematical representation of metabolism, providing gene-reaction-metabolite relationships.
[047] From a representational perspective, a metabolic network can be characterized as a graph in which, reactions are edges and. the metabolites are nodes. There are a number of publicly available graph-like representations of general metabolic systems, including those in the Human Metabolic Reconstruction 2.0 (HMR2) database. Alternatively, a different metabolic database could be used.
[048] GEMs for cancerous or non-cancerous tissues can be constructed or contextualized according to existing techniques, such as INi.T or GIMME, or modifications thereof. The lNIT algorithm (or its modification, “tlNIT”), creates a metabolic model consistent with gene expression data, that also carries out a basic set of “known” tasks specified by a user (e.g., production of certain amino acids). GIMME creates metabolic networks by minimizing the sum of .fiux.es through, reactions that that have gene expression below a specific threshold that approximates biological significance.
[049] GEMs can be individualized by incorporation of patient-specific information. For instance, samples of tissues are obtained from individual patients. For each tissue type, a representative omies profile is calculated, for example using maximum transcripts per kilobase million (TPM) across all samples in an analysis. A GEM is created as a reference model (e.g., with INIT or GIMME over H.MR2, and optionally with gene / reaction rules and the biomass equation defined based, on reference database or customized for the tissue). [050] The stoichiometric coefficients for each metabolite are used to determine macromolecular composition and maintenance costs. Examples of sources for these coefficients are as follows. Biomass is calculated using macromolecular weight fractions, omic datasets and genome sequetice(s). DNA coefficients are calculated using the genome sequence(s) which are then used to determine the relative abundance of each nucleotide. Model formula weight is extracted from each nucleotide and a stoichiometric coefficient calculated for each metabolite. Coefficients of RNA monomers are obtained by weighting metabolite frequency in a transcript by relative abundance to arrive at the relative abundance of ribonucleotide and amino acid metabolites. The sequence of RNA and relative abundance are then used to calculate the stoichiometric coefficients. Other sources, steps or sequences of steps are possible.
[051] To provide numerical stability of the biomass function, sensitivity analyses of each of the components of the biomass function is performed by leaving out individual or combinations of components and assessing how the resulting biomass “flux” changes in response to model input perturbations. For example, does decreasing the glucose influx bound result in a decreasing biomass “flux” or a stochastically changing biomass “flux"? Additionally, the growth associated ATP demand in the biomass function is scaled to ensure that the output biomass “flux” is within biologically reasonable ranges. One example of such a range is to scale the biomass “flux” to an average of 0.9%/day for breast cancer (which has a reported average specific growth rate or SGR. in the range of 0.3 - 3%/day).
[052] A gene expression minimum is defaulted to a level of 1 TPM, but is also configurable. This results in a “permissive” GEM for each tissue type, which contains an estimate of every reaction likely to occur within at least one sample. The metabol ic reactions possible in each tissue type are stored in one or more stoichiometric matrixes describing the stoichiometry of the reactions involved. Notably, this results in a personalized GEM for each individual and/or tissue. Note that these omics profiles can come from clinical omics data collected either prospectively or retrospectively.
[053] To integrate omics data, enzymatically governed, reactions in a reaction network can be annotated. with gene/protein rules that describe how proteins catalyze reactions. Thus, reactions with insufficient protein levels can be removed from the network for a given tumor. Two examples of this process follow. [054] In the first example, gene expression data is downloaded from TCGA as FPKM, normalized to TPM using standard techniques (dividing sample FPKM values by the sum of PFKM values per sample and multiplying by 10 6). The maximum TPM for each gene is calculated across all samples and compared to a threshold to determine whether an active protein is likely to be translated.
[055] In the second example, proteomic data can be downloaded front an. online repository such as (but not limited to) the Human Protein Atlas. Protein mass fractions are calculated by multiplying each protein by its molecular weight, normalizing protein mass (dividing each sample protein mass by the sum of the sample protein mass) to obtain a protein mass fraction for each protein. The protein mass fractions are multiplied by the total amount of protein in a cell (on average 0.67 g/gDW). Protein levels below a threshold (for example 1% of total protein) are considered likely inactive and removed from the model.
[056] In alternative examples, the gene expression data, proteomic data (or other omics data) can be obtained from clinical trials or individuals.
[057] In some cases, multiple enzymes form complexes to catalyze a reaction. This can be approximated through Boolean algebra by “AND”ing the gene reaction rules together. In other words, a reaction driven by an enzymatic complex is only considered “likely to occur” if there is evidence in the individual’s omics profile for the presence of all of the constituent enzymes.
[058] In some cases, multiple enzymes can act separately to catalyze a reaction; this often occurs with isozymes. This can be approximated through Boolean algebra by “OR”ing the gene reaction rules together. In other words, a reaction driven by multiple enzymes is considered as “likely to occur” as if there were multiple reactions occurring in parallel catalyzed by different enzymes.
[059] In some cases, multiple Boolean expressions can be used to represent the activation and inhibition of reactions by combinations of protein complexes, isozymes, and parallel enzymes.
[060] Thus, a representation of a metabolic network can be constructed based on a number of sources (e.g., databases), such. as annotated genomes, surveys of bibliomic data, chemical properties (e.g., conservation of dements), and possibly other sources. These representations can then be tailored to be consistent with the biological data of a specific organism or group of organisms. B. Matrix Formulation
[061] A stoichiometric matrix can be used as a compact mathematical formulation, of a set of connected metabolic transformations. Considering Figure 3 once more, the metabolic network of cell 300 inchides a number of compounds (glucose and the metabolites) and reactions between these compounds.
[062] To simplify the transformation of the metabolic network into a stoichiometric matrix, only the chain of reactions from glucose (G) to metabolite M5 is considered in the example below. This chain of reactions include the individual reactions:
Figure imgf000014_0001
[063] Where R1 is the reaction triggered by enzyme E1 , R2 is the reaction triggered by enzyme E2, R4 is the reaction triggered by enzyme E4, and R6 is the reaction triggered by enzyme E6.
[064] The stoichiometric matrix represents the reactions as columns and compounds as rows. For this chain of reactions, the stoichiometric matrix S is as follows:
Figure imgf000014_0002
[065] Looking to the first (leftmost) column, the value of -1 indicates that the glucose is transformed in Rl into metabolite M1. The rest of the rows in this first column take on a value of 0, because the other compounds are not involved in R1. Similarly, in the second column, the value of -1 indicates that metabolite Ml is transformed in R2 into metabolite M2. This process continues for the third and fourth columns.
[066] More complex reactions, such as the branching and merging shown in Figure 3, can be represented in a larger matrix by placing -1 and 1 in appropriate locations. In some eases, certain types of reactions may involve use of values other than -1 and 1. In general, a stoichiometric matrix, is usually a sparse m x n matrix, (where there m compounds and n reactions), in which most values are 0. In practice n is usually greater than m.
[067] Thus, there is a mapping from genes to proteins to reactions that can be represented in the stoichiometric matrix. Some genes encode one reaction, other genes encode multiple reactions, and multiple genes can also encode a single reaction. The stoichiometric matrix has an. explicit genetic basis.
[068] In addition to stoichiometric matrix S. a flux vector v can be created that represen ts the concentrations of compounds (e.g., in moles / volume / time) that flow through the metabolic network. As an example, the flux vector for the chain of .reactions represented in S could be:
Figure imgf000015_0001
[069] Further, the sum of the fluxes going in and out of a c ompound can be represented as a differential equation. The set of these differential equations for the metabolic network is a vector, . Putting this together, the following relationship holds:
Figure imgf000015_0002
[070] Thus, S is a mapping from an n -dimensional vector representing concentrations to an m-dimensional vector representing changes in concentrations over time due to reactions.
[071] If S and v are known, can be found through matrix algebra. For the
Figure imgf000015_0004
simplified example herein, this is:
Figure imgf000015_0003
[072] This set of differential equations can be used to simulate the dynamics of the metabolicnetwork.
C. Improved and Individualized Formulations
[073] But in practice, v is often not known and. instead must be solved for, at least in an approximate fashion. Given that there are many zeroes in , representing steady state
Figure imgf000015_0005
behavior, the equation can be constrained to steady state by setting . Further, the fluxes
Figure imgf000016_0001
in v can be constrained to have certain minimum and maximum values, vmin and vmax, respectively. Then solving for this modified version of v (i.e., v') given some objective function (e.g., to maximize growth), becomes an optimization problem characterized by S - v' = 0. Such a solution can be provided using linear programming, for instance via the COB'RApy package for Python with an underlying linear programming solver, such as the GNU Linear Programming Kit (GLPK), Gurobi, or the COIN-OR Linear Program Solver.
[074] The modeling process described so far is depicted in Figure 4 with a more involved example. In step A, metabolic network 400 is provided. This metabolic network represents the metabolic processing of a nutrient N through a number of reactions and intermediate metabolite compounds. Metabolic network 400 includes branches, merges, and cycles.
[075] Step B of Figure 4 depicts, as equation 402, a stoichiometric matrix S, a flux vector v, and a flux rate of change vector ™ for metabolic network 400. S can be constructed
Figure imgf000016_0002
as described above.
[076] Step C of Figure 4 depicts solution procedure 404. As noted, this involves constraining to 0, constraining the fluxes to be within some values and vmax , and then
Figure imgf000016_0003
optimizing for an objective function in order to mimic biological objective of a cell. Objective functions are known to include, but are not limited to, maximizing processes such cellular growth, nicotinamide adenine dinucleotide (NADH) production, ATP production, reaction flux correlation with gene expression (or gene expression scaled by an additional factor, for example enzymatic activity, translation rate, and/or protein degradation rate), or one or more cell byproducts. Biological objectives an also be minimizing in nature, for example parsimonious total flux or total flux through the import/ export reactions.
[077] For example, cancer is generally thought to grow and. divide as quickly as possible, so a reasonable objecti ve might be maximizing cellular growth, NA.DH, or ATP. For cells that do not grow rapidly, but maintain their state over a long period, of time, an objective fonction such as maximizing reaction flux correlation with gene expression or minimizing total flux might be a more reasonable objective than maximizing, growth.
[078] The permissive models noted above can be further constrained using a modified version of the GECKO formalism. Traditionally, GEMs have not incorporated data regarding enzymatic efficiency or protein size. The main limiting constraint traditionally placed on GEM solutions is that there is no accumulation of internal metabolites (which is one reason for the zeroes in with the exception of measured - or assumed - external consumption and production of metabolites). Therefore, internal reactions remain largely unconstrained by experimental knowledge.
[079] GECKO models address the limitation of lack of internal bounds by incorporating omics and kinetics data into metabolic models as additional metabolic reactions, which can then be constrained using proteomic or transcriptomic expression levels. For instance, GECKO uses proteomics data, and by extension, transcriptomic data, but the omics as used herein can be genomic, transcriptomic, proteomic, and/or metabolomic, In one aspect, kinetics data may include turnover number. The turnover number and additional kinetics data can obtained experimentally or from databases, for example the BRENDA dataset. Briefly, the GECKO algorithm works by modifying S to include explicit enzyme occupancy for metabolic reactions that are scaled by their enzymatic efficiency. More formally, consider an original metabolic reaction characterized as:
Figure imgf000017_0001
[080] The G ECKO version of this reaction is:
Figure imgf000017_0002
[081] Alternate version s of the reaction include ;
Figure imgf000017_0003
[082] Or:
Figure imgf000017_0004
[083] Where n is an arbitrary scalar value or represents the number of enzymes involved in a complex, and Km is the Michaelis constant for a reaction that has Michaelis- Menten kinetics (i.e., the substrate concentration at which the reaction rate is half of its maximum possible rate).
[084] E2 is the concentration, of a specific enzyme catalyzing the reaction R2 and kcatE2 is the enzyme turnover number for that enzyme (the inverse efficiency). One additional reaction is required for the GECKO formalism to be solved, the total enzyme mass available to a cell:
Figure imgf000018_0001
[085] While more realistic in a variety of cases, the GECKO formalism unfortunately can be numerically unstable. This instability arises because of the high range of values contained in the S matrix, predominantly coming from small values, which can range
Figure imgf000018_0002
from orders of 1 to 10-5 or even lower.
[086] The embodiments herein provide a numerically stable way of accounting for enzymatic constraints on cellular metabolism. This is a clear improvement over GECKO and other techniques. These embodiments operate by constraining the values in v rather than modifying the S matrix directly. Thus, the S matrix remains numerically stable, and the internal metabolic constraints are handled by v.
[087] Specifically, the upper internal flux bound or the upper and lower internal flux bounds are constrained based on the following equation when catalyzed by a single enzyme:
Figure imgf000018_0004
[088] In other words, for a reversible reaction:
Figure imgf000018_0003
[089] For a non-reversible reaction, = 0 and vmax takes on the same value as for reversible reactions. In tertns of the linear program, it can be configured to model maximum cell growth (or other objective functions) by changing values in v within the bounds of vmax and
[090] Here, kcat.,ij is the turnover number of the enzyme i in reaction j (e.g., the maximum number of chemical conversions of compound molecules per second that occur for a particular enzymatic concentration). The mass of protein i is estimated from the RNA- sequenced expression, proteonncally-identified, genomically-rnferred, or metabolomically inferred level of enzyme i and. its molecular weight Alternatively, an estimate of die protein molar fraction can be used in place of protein mass. The value of ptot is the dry weight fraction of proteins in a tumor cell (currently estimated to be 0.67 g/gDW, but values between 0.20 and 0.80 could be used). The value of a is the average saturation factor of proteins in a tumor cell (currently estimated to be 0.5, but values between 0.1 and 0.9 could be used). Notably; ptot is configurable per model, tissue, or cancer, and δ is configurable per reaction / enzyme pair. Also, the protein masses can be estimated from proteomic data. The equation can also include scaling to arrive at values that fall outside the range disclosed herein. More specifically, can be held constant and the biomass equation, reaction bounds, and/or S matrix coefficients scaled. For example, ptot can be set. to 0.067 and the biomass equation, reaction bounds, and/or S matrix coefficients scaled.
[091] This enzyme-bounding algorithm differs from the original GECKO formalism in several ways. in addition to those described above. Particularly, the original GECKO algorithm used maximum kcat estimates from the BRENDA database (available at www.brenda^enzymes.oig) for model constraining to prevent over-constraining the metabolic model. In contrast, the improved algorithm uses the median kcat values from BRENDA across physiological conditions with outliers removed to provide a maximum likelihood-type estimate of metabolism . These differences produce a more accurate representation of tumor metabolism than traditional methods. They automatically capture the Warburg Effect in minors (where tumors grow faster than other types of tissues through increased glucose uptake and fermentation of glucose to lactate), and allow accurate simulation of non-cancerous tissues. Previously non-cancerous tissues and cancerous tissues behaved basically the same in OEMs.
[092] In addition to constraining metabolic model internal fluxes based on RNA/protein expression and enzymatic efficiency, the external flux, bounds (metabolite uptake rates) can also be constrained based, on blood, chemistry concentrations from individual patients, including but riot limited to glucose, glutamine, glutamate, cysteine, amino acids, oxygen, carbon dioxide, lactate, lipids, triglycerides, cholesterol, albumin. Non-limiting examples of blood chemistry- concentrations that, can be incorporated in the current model include fasting blood glucose levels of -70-100 mg/dE in healthy individuals, Blood chemistry concentrations are collected from individual patient labs (in the clinic) or public repositories and scaled based on average blood flow through a body, average (or patient-specific) weight, and/or average water-weight of human cells to calculate a maximum possible nutrient, uptake rate for all nutrients included in the metabolic model (currently HMR2). These concentrations are integrated, as flux bounds for the given metabolites. One example of ho w these concentrations can be integrated is to use a scaling factor representing average blood flow through a unit mass of human tissue (this scaling factor can be approximated as 4.28 L/kg/hr, but other values are possible). This scaling factor can then be further adjusted by accounting for water concentration within the same tissue unit mass (for example, using 60% water or equivalently 40% dry mass). This specific scaling factor would then be blood, concentration multiplied by 4.7, and divided by 0.4.
[093] Using these concentrations- constrains model behavior into a physiologically reasonable space, which is then further constrained by personalized, (or archetypal) expression patterns, which can be from online databases or clinical samples. For an archetypal expression pattern, an estimated value (such as the median) for expression level of each gene or protein can be used instead of a patient-specific measurement.
[094] While not strictly required, the embodiments herein can involve additional modifications to the “standard” GEM-solving algori thm. For both tissue and tumor models, the components contained in the biomass equation in HMR2 and their respective weights can be modified, to develop a unique biomass equation that represents tumor biology with a high degree of accuracy In these tumor models, growth-associated ATP demand can be sealed until growth rate of the tumor models constrained with median RNA-sequenced gene expression data from The Cancer Genome Atlas (TCGA) data sets return a growth rate in. a biologically reasonable range. In tissue models, all terms related to DNA generation and growth associated ATP demand can be removed, from the biomass equation. In these models, a demand reaction for maintenance ATP can be added, which can be constrained to a flux rate of at least 0.0302 g/gDW/hr and up to 0.0906 g/gDW/hr (based on a 2000 Cal/day diet for a 70kg person) and relatively uniform nutrient use across body tissues, and further constrained outside these bounds for further personalization or tissue specificity.
[095] The embodiments described, herein overcome limitations associated with estimating metabolic function in humans. Experimentally measuring reaction fluxes is extremely challenging, for both internal and external reaction flux rates. Estimating metabolic flux across global diseases/condition (e.g., breast cancer) presents a basal challenge further complicated by unique challenges attributable to patieni-to-patient variability (i.e., intra- population metabolic flux variations in breast cancer patients). The methods described herein address this challenge by constraining flux bounds, which are not easily measured with readily accessible biological features such as omics data and blood chemistry parameters. Further, omics data in itself is insufficient to constrain metabolic behavior fully. These embodiments incorporate enzymatic kinetic data to more realistically simulate biological processes occurring during enzyme-catalyzed metabolic reactions. The methods described herein perform these constraining procedures in such a way as to allow for numerically stable simulation of metabolism, meaning that changes to the experimentally measured inputs (e.g. omics and blood, chemistry) result in non-random changes to the simulated output flux profile. Further refinements to the model, including tailoring the biomass equation or model, objective function and including maintenance ATP demand, increase the accuracy of simulations to capture metabolism in single tumors, single tissues, or archetypal tumors and tissues.
D. Personalized GEM Generation
[096] Putting this together. Figure 5 depicts the possible inputs that go into generation of personalized GEMs. Module 500 represents software that performs the operations for generating personalized GEMs, as described above. Possible inputs to module 500 include omics data 502 (e,g., genomics, transcriptornics, proteomics, and/or metabolomics), blood chemistry 504 (which could, be the blood chemistry of an individual patient or the average blood chemistry values for a population of Individuals), reaction kinetic parameters 506, and/or metabolic network model 508 (c.g., based on a stoichiometric matrix, possibly modified as described). An exemplary reaction kinetic parameter includes, but is not limited to, enzyme turnover number.
[097] Not all possible inputs need to be present to generate personalized GEMs. For example, some embodiments may involve only metabolic network model 508 and omics data 502, or just metabolic network model 508 and blood chemistry 504. Regardless, from the inputs provided, module 500 generates one or more personalized GEMs 510.
E. Example Operations
[098] Figure 6 is a flow chart illustrating an example embodiment. The process illustrated, by Figure 6 may be carried out by a computing device, such as computing device 100, and/or a cluster of computing de vices, such as server cluster 200. However, the process can be carried out by other types of devices or device subsystems.
[099] The embodiments of Figure 6 may be simplified by the removal of any one or more of the features shown therein. Further, these embodiments may be combined with features, aspects, and/or implementations of any of the previous figures or otherwise described herein. [100] Block 600 may involve obtaining a stoichiometric matrix representing a metabolic network of a tissue.
[101] Block 602 may involve obtaining omics data tor an individual.
[102] Block 604 may involve constructing a rate of change vector representing steady state concentrations of compounds in the tissue due to reactions within the metabolic network.
[103] Block 606 may involve solving, by way of linear programming and in accordance with an objective function, a flux vector for the reactions, wherein the rate of change vector is a product of the stoichiometric matrix and the flux vector, wherein elements of the flux vector are constrained to be within respective minimum and maximum values, and wherein for a specific element of the flux vector, at least one of its minimum value or its maximum value is based on the omics data from the individual and an enzymatic turnover number for a reaction relating to the specific element.
[104] Block 608 may involve generating, based on the flux vector, a personalized metabolic- model for the individual.
[105] In some embodiments, the omics data is predicted using clinical biomarkers of the individual. In some e mbodiments, the clinical biomarkers are one or more of age, sex, race, and/or menopausal status of the Individual.
[106] In some embodiments, the tissue is a cell.
[107] In some embodiments, the personalized metabolic model is simulation-ready.
[108] In some embodiments, the specific element of the flux vector represents a reversible reaction, wherein both the minimum value and. the maximum value are based on the enzymatic turnover number.
[109] In some embodiments, the specific element of the flux, vector represents a non- reversible reaction, wherein the minimum value is zero, and. wherein the maximum value is based on the enzymatic turnover number.
[110] In some embodiments, the enzymatic turnover number for the reaction is multiplied by a ratio of: a mass of an enzyme causing, the reaction over a sum of masses for all enzymes represented in the flux vector, or a molar fraction of the enzyme causing the reaction over the sum of masses for all enzymes represented in the flux vector.
[111] In some embodiments, the enzymatic turnover number is an estimated average enzymatic turnover number for the reaction. [112] In some embodiments, the enzymatic turnover number includes the minimum enzyme activity for a complex..
[113] In some embodiments, the enzymatic turnover number for a reaction includes multiple enzymes and their associated turnover numbers.
[114] In some embodiments, the enzymatic turnover is calculated based on expression of a single enzyme.
[115] In some embodiments, the enzymatic turnover number for the reaction is further multiplied by a dry weight fraction of proteins in cells of the tissue, and by an average saturation factor of the proteins.
[116] In some embodiments, the dry weight fraction of the proteins is between 0.55 and 0.80 g/gDW.
[117] In some embodiments, the average saturation factor of the proteins is between 0.0 and 1.0. In some embodiments, the average saturation factor of the proteins is between 0.4 arid 0.6.
[118] In some embodiments, the enzymatic turnover number is based, on a median value across physiological conditions, with outliers removed.
[119] In some embodiments, a t least one of the minimum value or the maximum value is also based on enzymatic molecular weight value for the reaction, relating to the specific element,
[120] In some embodiments, the enzymatic turnover number is based, on a maximum value across physiological, conditions.
[121] In some embodiments, external flux bounds of the personalized metabolic model are based on blood, chemistry concentrations from the individual.
[122] In some embodiments, the objective function, is to maximum cell growth within the tissue, or maximize cell growth while minimizing total flux.
[123] In some embodiments, the omics data is one or more of genomics, transcriptomics, proteomics, or metabolomics data.
[124] In some embodiments, the stoichiometric matrix includes enzyme occupancy for reactions that are scaled, by enzymatic efficiency.
[125] In some embodiments, the omics data is processed and converted into linear program constraints to account, for isozymes, complexes, promiscuous enzymes, and isocomplexes. [126] In some embodiments, the stoichiometric matrix includes enzyme occupancy for reactions that are scaled by enzymatic efficiency.
[127] In some embodiments, the stoichiometric matrix remains numerically stable, and internal metabolic constraints are accounted for by the flux vector.
[128] The saturation factor for each enzyme refers to the substrate occupation of the active site of an enzyme. When all active sites on an enzyme are occupied, the enzyme is saturated and the addition of more substrate has no effect on the enzymatic reaction. The saturation factor used herein could be a range or an exact value.
[129] The methods disclosed may utilize publicly available omics, tumor, and biomarker data in publicly available repositories or patient specific data collected from cells. Non-cancerous cell types from healthy subjects may be used as controls, as needed. In an exemplary embodiment the cells are from a human who has a cancer. The cancer may include, but is not limited to breast, lung, prostate, colorectal, kidney, bladder, non-Hodgkin’s lymphoma, thyroid, endometrial, head and neck, throat, liver, pancreatic, and splenic cancers. In an exemplary embodiment the cancer is lung, prostate, or breast cancer. In an exemplary model, the cells source disclosed herein, is humans. However, the model could be species- specific and utilize a wide range of mammalian cells.
Ill. Improved Identification of Prognostic Biomarkers
[130] Personalized GEMs, along with data regarding disease progression (e.g., survival data, therapy responsiveness data, time to recurrence data, and/or time to metastasis data), and. possibly other information, can be used to identify biomarkers that are prognostic for patient survival, among other factors. This prognostic biomarker prediction focuses on model-based reaction fluxes and their combinations for prognosing survival.
A. Identification of Metabolic Model-Related Prognostic Biomarkers
[131] Survival data (e.g., from TCGA or another public or private database) is obtained for the same tumors used as input data for module 500, Then, simulations from OEMs generated by module 500 are analyzed to find metabolic reactions (or reaction subsystems) whose fluxes contribute significantly to variability in. patient-to-patient metabolism. For instance, the flux contribution could be determined by way of a principal component analysis (PCA).
[132] PCA. is a dimensionality-reducing technique that attempts to simplify a data, set down to its factors that provide the most “information” (in an information-theory sense) in the .form of variability. For instance, this may involve finding new variables that are linear functions of those in the original data set, such that these new variables maximize or at least have a relatively high variance and that are uncorrelated or only mildly correlated with each other.
[133] PCA can be used to identify fluxes or subsystem fluxes contributing as one of the top j loading components in the principle components required to capture k% of variability in tumor metabolism across patients. In some embodiments, j can be 10, 25, or 50, while k can be 80, 85, 80, or 95. However, other values of j and k are possible, including testing all fluxes against survival for prognosis.
[134] Then, the association of these reactions and subsystem reactions to patient survival can he tested using univariate survival analyses (e.g., Kaplan Meier curves) and multivariate survival analyses (e.g., Cox Proportional Hazard Regression.) on the survival data. In these analyses, the fluxes may be grouped, based on quantiles. Note that other events (e.g., therapy responsiveness, remission, recurrence, or metastasis) could be used in place of survival in these techniques.
[135] Kaplan Meier curves allow survival analyses in. the presence of incomplete observations. For these observations, it. is known that some subjects died at respective points in time and it is known that some subjects were alive at respective periods in time. However, it is not known whether the some of the latter subjects ultimately survived (e.g., they were lost to follow-up attempts or otherwise dropped out of the study). The outcome of a Kaplan Meier analysis is a set of probabilities that a subject dies at various points in the future (e.g., 1 month out, 2 months out, 6 months out, etc.). In some cases, Kaplan Meier curves can be fit to an exponential or logistic function. But survivability data is often too complex for this univariate approach to capture the entire variability associated with survival fully. Cox Proportional Hazard Regression allows the survival rate to change over time (both increase and. deerease), while assuming that the hazard rates -for each feature remain proportional over time. In some eases, Cox Proportional Hazard Regression can be characterized as an autoregressive log-linear model.
[136] Any fluxes with a p-value < 0.05, are identified as potential prognostic factors. In some cases, combinations of fluxes can be used as potential prognostic factors. For example, “glycolysis flux” is a combination of fluxes through all reactions involved in the glycolysis pathway, or the set of reactions that a cell uses to consume glucose and produce energy prior to oxidative phosphorylation. Another example is the “metabolic efficiency flux” calculated as a sum of model subsystem fluxes divided by biomass, which essentially shows how much of a tumor's nutrient use goes directly into biomass production and how much is relegated to secondary metabolism. The p- values can be calculated using standard. methods including Cox Proportional Hazard (which is used to test for association between patient survival time and multiple patient characteristics), a Wald test (which is a non-parametric form of the t-test to test if samples were drawn from different distributions), a likelihood test (which tests the predictively of the model is sufficiently different from a null model), and/or a logrank test (which is a non-paramctric test comparing survival times between multiple populations).
[137] Also, metabolic phenotypes (fluxes through all reactions or subsystems) are clustered to identify and test a composite “metabolic phenotype” versus survival This clustering occurs following PCA or alternate dimensionality reduction techniques including - but not limited to - t-distributed stochastic neighbor embedding (tSNE), uniform manifold approximation and projection (UMAP), multidimensional scaling (MDS), or linear discriminant analysis (LDA).
B, identification of Prognostic Gene Expression Biomarkers
[138] Gene expression defines the process by which a gene causes the creation of proteins or other molecules, including whether and how much of these molecules are made. Differential gene expression. (DEG) analysis is performed using statistical tests to find genes whose expression changes across conditions (e.g., between cancerous and non-cancerous tissues), including with differences in metabolic fluxes or groups associated with survival. From R.N.A -sequencing data, DEG analysis can be performed, for example, using the DESeq2 package in the R programming language on raw RNA. sequencing counts, although, other DEG algorithms are frequently used including LASSO.
[139] For explanatory variables, patient features potentially associated with survival are used, including - but not limited to - (1) age, sex, tumor stage, and other factors known to influence expression in a given cancer type, (2) metabolic phenotype identified from above, (3) prognostic factors identified above, and/or (4) composite fluxes, including but not limited to glycolysis, TCA cycle, oxidative phosphorylation, pentose phosphate pathway, cysteine metabolism, retinol metabolism, or flux projections such as PCA, tSNE, or UMAP of fluxes or combinations thereof. Resulting p-values can be corrected for multiple tests using false discovery rate (FDR) for adjustment. An FDR level of less than 0.05 may be used as a significance level to determine DEGs. Differential expression of genes related to metabolic prognostic markers are identified as prognostic genes within a tumor.
[140] Also, expression of transcripts and metabolic prognostic markers can be correlated. Briefly, log (TPM+1) is correlated with each metabolic prognostic marker across all samples for each transcript using Pearson correlation. Transcripts are identified as strongly correlated if they have a correlation coefficient above a threshold (which can be selected at 0.5) and an FDR below a threshold (which can. be selected at 0.05). Strongly correlated transcripts are identified as additional prognostic genes.
[141] Notably, a modified version of this algorithm can be used for protein expression. For protein expression, the correlation analysis (above) is the same. For protein differential expression analysis, the theory is the same, but the techniques used differ because of the differences in the statistical distributions underlying protein and gene expression and measurement. For proteins, instead of DESeq2, statistical tests such as analysis of variance or ANOVA (which tests for population differences), Kruskal-Wallis (the non-parametric ANOVA), or LASSO can be used (as well as other statistical techniques).
[142] The prognostic genes (differentially expressed or directly correlated with metabolic prognostic fluxes), are tested to determine whether they are likely to be secreted to circulating proteins (or transeripts). Local annotations can. also be obtained from databases such as Uhiprot, NCBI, HumanGEM, HMR, and/or Human Metabolic Atlas. Transcripts are identified as coding for a secreted protein .if its Uniprot subcelluiar location annotation (or annotation from a similar source) is “extracellular” or “secreted” or similar.
C. Overall Prognostic Marker Identification
[143] Putting this together. Figure 7 depicts the possible inputs that go into identification. of prognostic markers. Module 700 represents software that performs the operations for identifying of prognostic markers, as described above. Possible inputs to module 700 include one or more OEMs generated by module 500 (i.e., GEMs 510), disease progression data 704 for one or more tumor types (which may include survival data, therapy responsiveness data, time to recurrence data, and/or time to metastasis data), and/or protein annotations 706.
[144] Not all of these possible inputs need to be present. .For example, protein annotations 706 could be omitted. Regardless, from the inputs provided, module 700 generales one or more prognostic biomarker outputs. These could include prognostic metabolic biomarkers 708, prognostic tumor-based RNA or protein biomafkers 710, and/or prognostic circulating RNA or protein metabolite biomarkers 712. This process has resulted in identifying metabolic flux profiles, gene expression signatures, and protein expression signatures indicative of the metabolic state of tumors that are prognostic of survival in multiple cancer types.
[145] Two examples of such state identification are provided for bladder cancer (Figure 8 A). and metastatic melanoma (Figure 8B). Figures 8A and 8B show Kaplan Meier plots of patient survival based on metabolic subtype. Each line represents the survival probabilities of a distinct metabolic group of patients over time, where the metabolic groups were identified using the methods described above. Metabolic groups whose survival probability drops quickly represent groups with poor prognosis.
[146] Prognostic markers of chemotherapy effectiveness and survi val can be used to predict survival or clinical efficacy of a therapeutic agent in a wide range of cancers, including but not limited to colorectal, kidney, bladder, non-Hodgkin’s lymphoma, thyroid cancer, and endometrial cancer. In exemplary embodiments of the current disclosure, the cancer is lung, breast, or prostate cancer, but these embodiments can be improve the prediction of survival and/or chemotherapeutic efficacy any of these other cancers (or additional cancers not explicitly discussed herein).
D. Example Operations
[147] Figure 9 is a flow chart illustrating an example embodiment. The process illustrated, by Figure 9 may be carried out by a computing device, such as computing device 100, and/or a cluster of computing devices, such as server duster 200. However, the process can be carried out. by other types of devices or device subsystems.
[148] The embodiments of Figure 9 may be simplified by the removal of any one or more of the features shown therein. Further, these embodiments may be combined with features, aspects, and/or implementations of any of the previous figures or otherwise described herein. Notably, the operations of Figure 9 can be combined (e.g:., serially) with the operations of Figure 6.
[149] Block 900 may involve obtaining a plurality of personalized metabolic models for a metabolic network of a tissue, wherein each of the personalized metabolic models has been, tailored by omics data from respective individuals.
[150] Block 902 may involve determining, through simulation, of the personalized metabolic models, metabolic characteristics of the respective individuals. [151] Block 904 may involve identifying, by way of a statistical technique or an expression signature, concentrations of compounds at various stages of the personalized metabolic models that characterize variability observed between the metabolic characteristics of the respecti ve individuals.
[152] Block 906 may involve identifying, by way of a survivability analysis technique, chemical reactions that are most likely to be prognostic of survival or chemotherapeutic efficacy for one or more of the respective individuals.
[153] In some embodiments, the statistical technique comprises dimensionality reduction.
[154] In some embodiments, the omics data is one or more of genomics, trauscriptomics, proteomics, or metabolomics data.
[155] In some embodiments, the variability includes tumor-specific growth rate.
[156] In some embodiments, the expression signature includes transcriptional signatures, protein signatures, or circulating biomarkers.
[157] In some embodiments, the circulating biomarkers are protein, transcript, metabolite, tumor DN A, or a circulating tumor cell number (e.g., a measure of how many cells a tumor is sloughing off).
[158] In some embodiments, the metabolic characteristics include tumor-specific growth rate, metabolic efficiency, subsystem specific metabolic flux, individual metabolic reaction flux, or combinations of chemical reactions and subsystems.
[159] In some embodiments, the chemotherapeutic efficacy is tor a specific form of cancer or cancer of a specific organ, such as prostate cancer, lung cancer, or breast cancer. The chemotherapeutic efficacy may be detennined. in a similar fashion for other types of solid tumors.
[160] In some embodiments,, the prognosis of survival is for prostate cancer, lung cancer, or breast, cancer. The prognosis of Survi val may be determined in a similar fashion for other types of solid tumors.
IV. Closing
[161] The present disclosure is not to be limited in terms of the particular embodiments described in this application, which are intended as illustrations of various aspects. Many modifications and variations can be made without departing from its scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the disclosure, in addition to those described herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within, the scope of the appended claims.
[162] The above detailed description describes various features and operations of the disclosed systems, devices, and methods with reference to the accompanying figures. The example embodiments described herein and in the figures are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separa ted, and designed in a wide variety of different configurations.
[163] With respect to any or all of the message flow diagrams, scenarios, and .flow charts in the figures and as discussed herein, each step, block, and/or communication can represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, operations described as steps, blocks, transmissions, communications, requests, responses, and/or messages can be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved. Further, more or fewer blocks and/or operations can be used with any of the message flow diagrams, scenarios, and flow charts discussed herein, and these message flow diagrams, scenarios, and flow charts can be combined with one another, in part or in whole.
[164] A step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a step or block that represents a processing of information can correspond to a module , a segment, or a portion of program code (including related data). The program code can include one or more .instructions executable by a processor for implementing specific logical, operations or actions in the method or technique. The program code and/or related data can be stored on any type of computer readable medium such as a storage device including RAM, a disk drive, a solid-state drive, or another storage medium.
[165] The computer readable medium can also include non-transitory computer readable media such as non-transitory computer readable media that, store data for short periods of time like register memory and processor cache. The non-transitory computer readable media can further include non-transitory computer readable media that store program code and/or data for longer periods of time. Thus, the non-transitory computer readable media may include secondary or persistent long-term storage, like ROM, optical or magnetic disks, solid-state drives, or compact disc read only memory (CD-ROM), for example. The non-transitory computer readable media can also be any other volatile or non-volatile storage systems. A non- transitory computer readable medium can be considered, a computer readable storage medium, for example, or a tangible storage device.
[166] Moreover, a step or b lock that represents one or more information transmissions can correspond to information transmissions between software and/or hardware modules in the same physical, device. However, other information transmissions can be between software modules and/or hardware modules in different physical devices.
[167] The particular arrangements shown in the figures should not be viewed as limiting, ft. should be understood that other embodiments could include more or less of each element shown in a gi ven figure. Further, some of the illustrated elements can be combined or omitted. Yet further, an example embodiment can include elements that are not illustrated in the figures.
[168] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purpose of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.

Claims

CLAIMS What is claimed is:
1. A computer-implemented method comprising: obtaining a stoichiometric matrix representing a metabolic network of a tissue; obtaining omics data for an individual; constructing a rate of change vector representing steady state concentrations of compounds in the tissue due to reactions within the metabolic network; solving, by way of linear programming and in accordance with an objective function, a flux vector for the reactions, wherein the rate of change vector is a product of the stoichiometric matrix and the flux vector, wherein elements of the flux vector are constrained to be within respective minimum and maximum values, and wherein for a specific element of the flux vector, at least one of its minimum value or its maximum value is based on the omics data from the individual and a kinetic parameter for a reaction relating to the specific element; and generating, based on the flux vector, a personalized metabolic model for the individual
The computer-implemented method of claim 1, wherein the omics data is predicted using clinical biomarkers of the individual.
3. The computer-implemented method of claim 2, wherein the clinic al biomarkers are one or more of age, sex, race, or menopausal status of the individual.
4. The computer-implemented method of claim 1 , wherein the tissue is a cell or collection of cells.
5. The computer-implemented method of claim 1 , wherein the personalized metabolic model is simulation-ready.
6. The computer-implemented method of claim 1, wherein the kinetic parameter is an enzymatic turnover number.
7. The computer-implemented method of claim 6, wherein the enzymatic turnover number is based on a median value across physiological conditions, with outliers removed.
8. The computer-implemented method of claim 6, wherein, the enzymatic turnover number is based on a maximum value across physiological conditions, with outliers removed.
9. The computer-implemented method of claim 6, wherein the specific element of the flux vector represents a. reversible reaction, and wherein both the minimum value and the maximum value are based on the enzymatic turnover number.
10. The computer-implemented method of claim 6, wherein the specific element of the flux vector represents a non-revcrsible reaction, wherein the minimum value is zero, and wherein the maximum value is based on the enzymatic turnover number.
11. The computer-implemented method of claim 6, wherein kinetic parameter for a reaction is based on multiple enzymes and their associated enzymatic turnover numbers.
12. The computer-implemented method of claim 6, wherein the enzymatic turnover number for the reaction is multiplied by a ratio of: a mass of an enzyme causing the reaction over a sum of masses for all enzymes represented in the flux vector, or a molar fraction of the enzymic causing the reaction over the sum of masses for all enzymes represented in the flux vector.
13. The computer-implemented method of claim 6, wherein the enzymatic turnover number is an estimated average enzymatic turnover number for the reaction.
14. The computer-implemented method of claim 6, wherein the enzymatic turnover number for the reaction is further multiplied by a dry weight fraction of proteins in cells of the tissue, and by an average saturation factor of the proteins.
15. The computer-implemented method of claim 14, wherein the dry- weight fraction of the proteins is between 0.20 and 0.80 g/gDW.
16. The computer-implemented method of claim 14, wherein the average saturation factor of the proteins is between 0.0 and 1.0.
17. The computer- implemented method of claim 16, wherein the average saturation factor of the proteins is between 0.4 and 0.6.
18 , The computer-implemented method of claim 6, wherein the enzymatic turnover number is based on a maximum enzymatic turnover value across physiological conditions.
19. The computer-implemented method of claim 1 , wherein at least one of the minimum value or the maximum value is also based on enzymatic molecular weight value for the reaction relating to the specific element.
20. The computer-implemented method of claim 1 , wherein external flux bounds of the personalized metabolic model are based on blood chemistry concentrations from the individual.
21. The computer-implemented method of claim 1, wherein the objective function is to maximize cell growth within the tissue, or maximize cell growth while minimizing total flux.
22. The computer-implemented method of claim 1, wherein the omics data is one or more of genomics, transcriptomics, proteomics, or metabolomics data.
23. The computer-implemented method of claim 1, wherein the omics data is processed and converted into linear program constraints to account for isozymes, complexes, promiscuous enzymes, and isocomplexes.
24. The computer-implemented method of claim I, wherein the stoichiometric matrix includes enzyme occupancy for reactions that are scaled by enzymatic efficiency.
25. The computer-implemented method of claim 1 wherein the stoichiometric matrix remains numerically stable, and wherein internal metabolic constraints are accounted for by the flux vector.
26. The computer-implemented method of claim 1. wherein, obtaining the stoichiometric matrix comprises performing sensitivity analyses of one or more components of a biomass function.
27. The computer-implemented method of claim 26, wherein performing the sensitivity analyses comprises omitting a subset of the components and assessing how biomass flux changes in response.
28. The computer-implemented method of claim 1, wherein obtaining the stoichiometric matrix comprises scaling growth associated ATP demand in a biomass function such that an output biomass flux is within predetermined. biologically ranges.
29. An article of manufacture including a non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing system, cause the computing, system to perform the operations of claims 1-28.
30. A computing system comprising: one or more processors; memory; and program instructions, stored in the memory, when executed by the one or more processors cause the computing system to perform the operations of claims 1-28.
31. A computer-implemented method comprising: obtaining a plurality of personalized metabolic models for a metabolic network of a tissue, wherein each of the personalized, metabolic models has been tailored by omics data from respective individuals; determining, through simulation of the personalized metabolic models, metabolic characteristics of tile respective individuals; identifying, by way of a statistical technique or an expression signature, concentrations of compounds at various stages of the personalized, metabolic models that characterize variability observed. between the metabolic characteristics of the respective individuals; and identifying, by way of a survivability analysis technique, chemical reactions that are most likely to be prognostic of survival or chemotherapeutic efficacy for one or more of the respective individuals.
32, The computer-implemented method of claim 31 , wherein the statistical technique comprises dimensionality' reduction.
33. The computer-implemented method of claim 31, wherein the omics data is one or more of genomics, transcriptomics, proteomics, or metabolomics data.
34. The computer-implemented method of claim 31 , wherein the variability includes tumor-specific growth rate.
35. The computer-implemented method of claim 31, wherein the expression signature includes transcriptional signatures, protein signatures, or circulating biomarkers.
36. The computer-implemented method of claim 35, wherein the circulating biomarkers are protein, transcript, metabolite, tumor DMA, or a circulating tumor cell number.
37. The computer-implemented method, of claim 31 , wherein the metabolic characteristics include tumor -specific growth rate, metabolic efficiency, subsystem specific metabolic flux, individual metabolic reaction flux, or combinations of chemical reactions and subsystems.
38. The computer-implemented method of claim 31 , wherein the chemotherapeutic efficacy is for a specific form of cancer or cancer of a specific organ.
39. The computer-implemented method of claim 38, wherein the chemotherapeutic efficacy is for prostate cancer, lung cancer, or breast cancer.
40. The computer-implemented method of claim 31 , wherein the prognosis of survival is for prostate cancer, lung cancer, or breast cancer.
41. An article of manufacture including a non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing system, cause the computing system to perform the operations of any one of claims 31-40.
42. A computing system comprising: one or more processors; memory, and program instructions, stored in the memory, when executed by the one or more processors cause the computing system to perform the operations of any one of claims 31 -40.
PCT/US2022/046520 2021-11-03 2022-10-13 Metabolic modeling for disease prognosis and treatment WO2023081000A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163275085P 2021-11-03 2021-11-03
US63/275,085 2021-11-03

Publications (1)

Publication Number Publication Date
WO2023081000A1 true WO2023081000A1 (en) 2023-05-11

Family

ID=86241992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/046520 WO2023081000A1 (en) 2021-11-03 2022-10-13 Metabolic modeling for disease prognosis and treatment

Country Status (1)

Country Link
WO (1) WO2023081000A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001079468A2 (en) * 2000-04-13 2001-10-25 Incyte Genomics, Inc. Drug metabolizing enzymes
US20110010150A1 (en) * 2008-02-19 2011-01-13 The Regents Of The University Of California Methods and systems for genome-scale kinetic modeling
US20130198214A1 (en) * 2012-01-30 2013-08-01 The Government of the United State of America, as represented by the Secretary, Dept of Health and Human Services Personalized dynamic feedback control of body weight
US20140128273A1 (en) * 2008-11-18 2014-05-08 Ridge Diagnostics, Inc. Metabolic Syndrome and HPA Axis Biomarkers for Major Depressive Disorder
US20150039274A1 (en) * 2013-02-05 2015-02-05 Ramot At Tel-Aviv University Ltd. System and method for personalized metabolic modeling
WO2016010961A1 (en) * 2014-07-15 2016-01-21 Abbvie Inc. Enzyme occupancy assay
US20210241846A1 (en) * 2002-03-29 2021-08-05 Genomatica, Inc. Multicellular metabolic models and methods

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001079468A2 (en) * 2000-04-13 2001-10-25 Incyte Genomics, Inc. Drug metabolizing enzymes
US20210241846A1 (en) * 2002-03-29 2021-08-05 Genomatica, Inc. Multicellular metabolic models and methods
US20110010150A1 (en) * 2008-02-19 2011-01-13 The Regents Of The University Of California Methods and systems for genome-scale kinetic modeling
US20140128273A1 (en) * 2008-11-18 2014-05-08 Ridge Diagnostics, Inc. Metabolic Syndrome and HPA Axis Biomarkers for Major Depressive Disorder
US20130198214A1 (en) * 2012-01-30 2013-08-01 The Government of the United State of America, as represented by the Secretary, Dept of Health and Human Services Personalized dynamic feedback control of body weight
US20150039274A1 (en) * 2013-02-05 2015-02-05 Ramot At Tel-Aviv University Ltd. System and method for personalized metabolic modeling
WO2016010961A1 (en) * 2014-07-15 2016-01-21 Abbvie Inc. Enzyme occupancy assay

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DIELMANN-GESSNER JESSICA, GROSSMAN MORAN, CONTI NIBALI VALERIA, BORN BENJAMIN, SOLOMONOV INNA, FIELDS GREGG B., HAVENITH MARTINA, : "Enzymatic turnover of macromolecules generates long-lasting protein–water-coupled motions beyond reaction steady state", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, NATIONAL ACADEMY OF SCIENCES, vol. 111, no. 50, 16 December 2014 (2014-12-16), pages 17857 - 17862, XP093065721, ISSN: 0027-8424, DOI: 10.1073/pnas.1410144111 *
SUPREETA VIJAYAKUMAR; MAX CONWAY; PIETRO LI\'O; CLAUDIO ANGIONE: "Seeing the wood for the trees: a forest of methods for optimisation and omic-network integration in metabolic modelling", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 21 September 2018 (2018-09-21), 201 Olin Library Cornell University Ithaca, NY 14853 , XP080920981, DOI: 10.1093/bib/bbx053 *

Similar Documents

Publication Publication Date Title
Heinken et al. Genome-scale metabolic reconstruction of 7,302 human microorganisms for personalized medicine
Guo et al. Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer
Colijn et al. Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production
Kim et al. Methods for integration of transcriptomic data in genome-scale metabolic models
JP2010225171A (en) Method and system to identify operational reaction pathway
US20140330583A1 (en) Systems medicine platform for personalized oncology
Aurich et al. Computational modeling of human metabolism and its application to systems biomedicine
Benyamini et al. Flux balance analysis accounting for metabolite dilution
Bernabe et al. Systems biology of the human microbiome
Peng et al. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways
Quo et al. Reverse engineering biomolecular systems using− omic data: challenges, progress and opportunities
Shterev et al. permGPU: Using graphics processing units in RNA microarray association studies
Achreja et al. Metabolic collateral lethal target identification reveals MTHFD2 paralogue dependency in ovarian cancer
Tsouka et al. Constraint-based metabolic control analysis for rational strain engineering
US20210090686A1 (en) Single cell rna-seq data processing
Lucas et al. A Bayesian analysis strategy for cross-study translation of gene expression biomarkers
US7788041B2 (en) Compositions and methods for modeling human metabolism
WO2023081000A1 (en) Metabolic modeling for disease prognosis and treatment
Gonzalez et al. ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data
Lucas et al. Cross-study projections of genomic biomarkers: an evaluation in cancer genomics
Berg et al. Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data
Bidaut et al. Determination of strongly overlapping signaling activity from microarray data
Jaffe et al. Gene set bagging for estimating the probability a statistically significant result will replicate
Liu et al. MethylSeqDesign: A framework for Methyl-Seq genome-wide power calculation and study design issues
Rahman et al. Predictive modeling of anti-cancer drug sensitivity from genetic characterizations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22890599

Country of ref document: EP

Kind code of ref document: A1