CN113470732A - Microbial metabolism network model multi-optimization target determination method and application thereof - Google Patents

Microbial metabolism network model multi-optimization target determination method and application thereof Download PDF

Info

Publication number
CN113470732A
CN113470732A CN202110641532.3A CN202110641532A CN113470732A CN 113470732 A CN113470732 A CN 113470732A CN 202110641532 A CN202110641532 A CN 202110641532A CN 113470732 A CN113470732 A CN 113470732A
Authority
CN
China
Prior art keywords
population
individuals
network model
target
pareto
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110641532.3A
Other languages
Chinese (zh)
Other versions
CN113470732B (en
Inventor
颜学峰
范星存
夏建业
周静茹
庄英萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202110641532.3A priority Critical patent/CN113470732B/en
Publication of CN113470732A publication Critical patent/CN113470732A/en
Application granted granted Critical
Publication of CN113470732B publication Critical patent/CN113470732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Physiology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a microbial metabolism network model multi-optimization target determination method and application thereof. The method is based on a genome scale metabolic network model of microorganisms, a plurality of optimization targets are defined according to a general rule of microorganism growth, a constraint condition is determined by utilizing flux balance analysis, an optimization problem is constructed and solved, and a main structure of the solving method adopts a multi-target differential evolution method. Firstly, defining an objective function, and then generating an initial population meeting constraint conditions by using a general single-objective linear programming method according to a basic rule of biological growth. And then, iteration is carried out according to the steps of the adjusted differential evolution algorithm, and a Pareto optimal solution set meeting the optimization target and the constraint condition can be obtained after the iteration is finished. And finally, analyzing the Pareto optimal solution set to complete the solution of the genome metabolic network model. The method can be applied to prediction of key flux in central metabolism.

Description

Microbial metabolism network model multi-optimization target determination method and application thereof
Technical Field
The invention belongs to the crossing field of biotechnology and information technology, and relates to the determination of a genome-scale metabolic network optimization target of a microorganism and the application of multi-target optimization in genome-scale metabolic network solution.
Background
Metabolism refers to a series of chemical reactions that occur within an organism. The general cellular metabolism has several processes as follows: 1) obtaining a nutrient substrate from outside the cell; 2) decomposing the substrate into energy units and metabolite molecules; 3) synthesizing into protein, saccharide, lipid and other macromolecules and energy. Therefore, the metabolic system plays a very critical role in the cellular activity in microorganisms. The genome-scale metabolic network model serves as a bottom-up system biology tool and links genes, proteins and reactions in microorganisms together, so that the metabolic network model is predicted based on cellular metabolism and phenotype of certain specific constraints. The core of the genome-scale metabolic network model is a stoichiometric matrix consisting of the relationships between metabolites and reactions, wherein the rows of the matrix represent the metabolites involved in the metabolic processes of the microorganisms and the columns represent the metabolic reactions. A positive coefficient in the matrix represents that the reaction produces the metabolite and a negative coefficient represents that the reaction consumes the metabolite.
In the phenotypic and metabolic predictions using a microbial genome-scale metabolic network model, a commonly used method is an analytical method based on flux equilibrium. Flux equilibrium analysis is a mathematical method for analyzing the flux of reactions related to metabolites through a cell metabolic network, and defines a linear programming problem based on single-target solution, and the optimization target is that the specific growth rate of microorganisms reaches the maximum. In solving the model, the method proposes an assumption: the metabolic process of the microorganism reaches a steady state, namely the concentration of the metabolite does not change along with the change of time, so that constraint conditions can be added to the linear programming problem of the solution model. Meanwhile, the upper and lower limits of each reaction flux in the given model and the steady-state conditions form constraint conditions for solving the problem, so that the linear programming problem can be solved according to the conditions.
The optimized result of the linear programming obtained by the flux balance analysis method is a group of flux distributions, and the analysis of the group of reaction flux results can obtain the predicted value of the maximum specific growth rate and the fluxes of some interested key reactions, such as the generation of cell metabolites, the generation of energy in the metabolic process, the flux of central metabolic flow and the like. Therefore, the calculation result of the optimization model plays an important role in theoretical research on the microorganisms, and can be used as an important tool for analyzing and designing the metabolic processes of the microorganisms. When the solving result of the metabolic network model can reach a certain precision, the method can be used for guiding metabolic work or research of genetic engineering.
With the progress of science and technology and the development of production technology, more and more traditional research technologies develop towards digitization and informatization, and in the process of digitization, the optimization problem almost spreads over all fields of scientific research and engineering practice, and becomes the indispensable theoretical basis and research method of modern science and technology, so that various optimization methods and technologies are greatly developed. In terms of microbial genome metabolic network models, the most commonly used optimization methods are Linear Programming (LP) and Quadratic Programming (QP).
With the proposal of the whole-cell digital model concept, the importance of the metabolic network is further improved, and the requirement on the result of the metabolic network model is higher and higher. Supplementing the model by introducing multiple optimization objectives is one of the methods to improve the rationality of the solution results. The growth of microorganisms satisfies a certain natural law that the production of intracellular energy is minimized as much as possible while satisfying a certain specific growth rate, which is a characteristic of the growth of microorganisms. When a single-target linear programming method is used, a pure mathematical solving method is adopted to analyze the metabolic network model, and the characteristics of the biological system are difficult to be well reflected under the solving method. And a plurality of optimization targets are defined, so that the defect can be compensated to a certain extent, namely more supplementary information is introduced to solve the microbial metabolism model, and the result obtained by solving is closer to the experimental result. While solving multiple optimization targets, a differential evolution algorithm with heuristic and random characteristics is considered as one of the most effective methods.
The differential evolution algorithm is a simple and effective heuristic parallel search algorithm for solving problems, has the advantages of fast convergence, few control parameters, simple setting, stable optimization result and the like, and is a population-based global search algorithm. The differential evolution algorithm takes the differential information of a plurality of individuals in a certain proportion as the disturbance quantity of the individuals, so that the algorithm has adaptability in the searching direction. In the early stage of evolution, the disturbance amount is large due to the fact that the difference of individuals in a population is large, and therefore an algorithm can search in a large range; and at the later stage of evolution, the algorithm tends to be convergent, the difference between individuals is small, the algorithm searches nearby the individuals, and the local search capability is strong. The differential evolution algorithm has the ability to learn from population individuals, so that it has incomparable performance with other evolution algorithms.
Therefore, the method for solving the microbial metabolism network model based on the differential evolution algorithm is provided, the model is solved under the condition of a plurality of optimization targets, and basic rule information of microbial growth is added by a mathematical method, so that the result is more fit for the actual growth process. Taking a calculation result of a general single-target linear programming as a starting point, fully utilizing the capability of a differential evolution algorithm for learning and acquiring information from a population, searching and solving a plurality of optimization targets to obtain a Pareto optimal solution set meeting constraint conditions and the optimization targets, and finding out an optimal solution close to the actual cell growth process from the Pareto optimal solution set.
Disclosure of Invention
The invention aims to solve the microbial metabolism network model when steady state balance is achieved by utilizing a differential evolution multi-objective optimization algorithm, and simultaneously comprehensively considers the general rule of microbial growth to determine the specific contents of a plurality of optimization objectives so as to increase the biological information contained in the microbial metabolism network model when the microbial metabolism network model is solved, thereby improving the reliability and the rationality of the network model result.
In order to achieve the purpose of the invention, the specific technical scheme of the invention is as follows:
a method for determining multiple optimization targets of a microbial metabolism network model comprises the steps of determining multiple optimization targets according to general information of microbial growth, and considering the basic rule of microbial growth by a mathematical method, so that biological information contained in the microbial metabolism network model during solving is increased, and the result is more fit for the actual growth process. Taking a calculation result of a general single-target linear programming as a starting point, fully utilizing the capability of a differential evolution algorithm for learning and acquiring information from a population, searching and solving a plurality of optimization targets to obtain a Pareto optimal solution set meeting constraint conditions and the optimization targets, and finding out an optimal solution close to experimental data in the actual cell growth process; the method comprises the following steps:
step 1, setting an optimization target, and determining a target function to be optimized according to the optimization target;
step 2, establishing an equality constraint Sv of 0 according to a microorganism metabolism network model in steady state balance, establishing and solving a single-target linear programming problem by taking the maximum specific growth rate as a single target function; wherein S is a stoichiometric matrix of a microbial metabolic network; v is a vector formed by all reaction fluxes in the metabolic network;
step 3, forming the solving result of the single-target linear programming into an initial population of the multi-target differential evolution algorithm, setting relevant parameters of the differential evolution algorithm, starting iterative computation, and outputting a Pareto optimal set meeting the conditions after a certain number of iterations is reached;
step 4, selecting proper individuals in the Pareto optimal set to obtain central metabolic flux data and central metabolic energy substance generation data of the individuals in the Pareto optimal set; as a result of a model of the microbial metabolic network.
Specifically, for the objective function described in step 1, first, the optimization objective of the problem is determined as follows: the specific growth rate is kept maximum in the present situation, and the production of intracellular energy such as ATP, NADH, NADPH, etc. during the growth process is minimized. Determining an objective function to be optimized according to the optimization objective, wherein the objective function is expressed as:
f=cTv
wherein v is a vector formed by all reaction fluxes in the metabolic network, c is a weight vector taking a value in {0,1}, and the dimensionality is the same as v. All objective functions are determined in this form. Thus, the objective function is set to the sum of all reaction fluxes that meet the optimization objective.
For the weight vector c, when the objective function is that the specific growth rate is maximum, the weight vector representing the reaction of the specific growth rate is set to be 1, and the rest is set to be zero; similarly, when the objective function is such that the amount of ATP synthesized is minimum, the weights corresponding to all the reactions that produce ATP are 1, and the weights of the remaining reactions are set to zero. This results in a weight vector of 0 and 1 with the same dimensions for all reactions.
Specifically, the steady-state metabolic network model in step 2 is solved for a single target. The method is a metabolic network model solving method based on Flux Balance Analysis (FBA), and the FBA method is based on the following points:
(1) at steady state of the metabolic system, the production and consumption of the individual compounds reach equilibrium, i.e. the concentrations of the individual metabolites do not change any over time:
Figure BDA0003108030110000041
wherein x is the concentration of the metabolite.
Based on this assumption, the equality constraints of the model of the microbial metabolic network at steady state equilibrium can be established:
Sv=0
(2) each reaction in the metabolic network model has upper and lower bounds, which define the solution space of the model in the form of inequality constraints.
(3) With specific growth rate μmax as the objective function:
further, after the above principles are combined, a single-target linear programming problem can be formed:
max μ
Figure BDA0003108030110000042
wherein μ is the specific growth rate; s is a stoichiometric matrix of a microbial metabolic network; v is a vector formed by all reaction fluxes in the metabolic network; u and l are vectors formed by the upper limit and the lower limit of the reaction flux.
Specifically, the multi-target differential evolution algorithm described in step 3 is a heuristic search algorithm based on population. The control parameters of the differential evolution algorithm are simple, and the maximum iteration times MaxGen, the population size NP and the scaling factor F are mainly included. The solving results of the single-target linear programming form an initial population, iterative computation is carried out by using a differential evolution algorithm, and a Pareto optimal solution set of the multi-target problem can be obtained according to a Pareto dominant relationship and non-dominant sequencing.
Further, the Pareto optimal solution set and its associated definitions may be described, for a multi-objective optimization problem:
min[f1(x1,x2,L,xD),f2(x1,x2,L,xD),L,fn(x1,x2,L,xD)]
Figure BDA0003108030110000051
wherein f is1,f2,L,fnFor n objective functions to be optimized.
Figure BDA0003108030110000052
Are respectively a variable xjLower and upper bounds.
Design the individual
Figure BDA0003108030110000053
Are two sets of feasible solutions to the problem, the objective function values corresponding to the two sets of individuals are respectively a sum
Figure BDA0003108030110000054
If they are the sameThe following two conditions are satisfied:
Figure BDA0003108030110000055
Figure BDA0003108030110000056
then call the individual x1Dominating individual x2. The surface of the solution space that constitutes the surface of all non-dominant individuals in the solution set is called the PARETO frontier. The set of individuals on the front edge of PARETO is called the PARETO optimal solution set.
Further, in order to obtain a complete Pareto frontier, it is necessary that the individuals in the optimal solution set have a certain spatial distribution. Therefore, the crowdedness operator is introduced to represent the distribution situation of the individuals in the population, and the smaller the crowdedness operator is, the closer the distribution of the individuals is. Therefore, the crowdedness is a measure of the distance and density between individuals in a solution set. The method for calculating the crowding degree comprises the following steps:
Figure BDA0003108030110000057
wherein, I [ I ]]M represents the crowdedness of the ith individual on the mth objective function, f [ i + 1]]M denotes the mth objective function value of the (i + 1) th individual,
Figure BDA0003108030110000058
respectively the maximum and minimum values of the mth objective function.
Further, assume that each individual in the current population has two attributes:
(1) non-dominant ranking level irank
(2) Crowding distance idistance
Thus definition <nHas the following properties:
if(irank<jrank)
or
((irank=jrank)and(idistance>jdistance))
then i<nj
that is, between two solutions with different non-dominant grades, we prefer the solution with the lower grade, i.e., the one that performs better. If two solutions have the same non-dominant rank, then we prefer to choose a solution with a larger crowding distance.
Further, the overall process of step 3 is:
(1) initialization: obtaining a solving result from the step 2, and constructing an initial population;
(2) setting control parameters of a differential evolution algorithm: maximum iteration times MaxGen, population size NP and a scaling factor F;
(3) mutation operation: according to the general mutation strategy of differential evolution, new individuals are generated for individual mutation in the parent population:
vi=xr1+F×(xr2-xr3)
wherein r1, r2, r3 are random numbers selected from [1, NP ], and r1 ≠ r2 ≠ r3, F is a control parameter for controlling the size of the differential vector, and is a random number of [0,1] in general.
(4) Pareto governs ranking: and combining the parent population and the test population into a synthetic population, calculating an objective function value of the synthetic population, and grading each individual in the synthetic population according to a Pareto governing principle. The lower the Pareto rating, the more excellent the individual is.
(5) And (3) calculating the crowding degree: and calculating the crowding distance of each individual to represent the distribution condition of the individuals in the population, wherein the smaller the crowding distance is, the closer the distribution of the individuals is. Too tight a distribution of solutions will be detrimental to us in obtaining a complete Pareto frontier.
(6) Generating a progeny population: and selecting the offspring population according to two evaluation operators of Pareto grade and crowding degree for individuals in the synthetic population. Firstly, carrying out hierarchical arrangement from low to high according to Pareto sorting grades, putting the whole layer into a child population from a population with a low grade until the size of individuals on a certain layer exceeds the population size after the individuals on the certain layer are put into the child population, namely, the individuals cannot be all put into the child population; and (4) arranging the individuals which cannot be placed in the offspring population from large to small according to the crowding degree, and preferentially placing the individuals with the large crowding degree into the offspring population until the offspring population is filled.
(7) And (5) substituting the filial generation population obtained in the step (6) into the step (3), repeating the steps (3) to (6) to carry out iterative calculation until the maximum iteration number MaxGen is reached, and outputting a final iteration result.
Specifically, for the results in step 4, the central metabolic flux data was calculated for the reference strain under different growth conditions. Central Metabolism (CCM) is a key metabolic link in the metabolic process of microorganisms, which describes the integration of transport and oxidation pathways of major carbon sources within cells. In most microorganisms, the major pathways of central metabolism are the phosphotransferase system, glycolysis (EMP), the Pentose Phosphate (PP) pathway and the tricarboxylic acid cycle (TCA) pathway of the glyoxylate shunt. Central metabolism is the major source of energy required by the organism and provides precursor substances for other metabolism in the body.
The invention also aims to provide a method for determining the multiple optimization targets of the microbial metabolism network model, which is applied to the prediction of the key flux in the central metabolism.
Compared with the prior art, the method has the following advantages and beneficial effects: (1) determining a plurality of objective functions for model solution, so that biological information in the cell growth process is additionally considered when the model is solved; (2) a multi-objective optimization solving method is used, so that a new method is provided for solving the microbial metabolism network model; (3) the learning ability of multi-objective optimization to the population is fully utilized, and the prediction rationality of partial key flux in central metabolism is improved; (4) the reliability of the solving result can be improved, and the further development of metabolic engineering and the like can be promoted.
Drawings
FIG. 1 is an overall flow chart of the multi-objective differential evolution algorithm of the present invention;
FIG. 2 is a schematic diagram of a multi-objective algorithm selection operation of the present invention;
FIG. 3 is a graph of Pareto fronts iteratively obtained under growth condition one in accordance with the present invention;
FIG. 4 is a Pareto front plot obtained iteratively in the case of growth condition two in accordance with the present invention;
FIG. 5 is a graph of data for the central metabolic pathway obtained iteratively under growth condition one in accordance with the present invention;
FIG. 6 is a graph of data of central metabolic pathways iteratively obtained under growth conditions two in accordance with the present invention;
Detailed Description
The invention is further illustrated by the following examples:
in the method for determining and solving multiple optimization targets of a microbial metabolic network model, the embodiment adopts a genome-scale metabolic network model iJB1325 of aspergillus niger, which is one of the most complete models of current aspergillus niger information and comprises 2320 reactions, 1818 metabolites and 1325 genes. This example uses iJB1325 to solve and simulate the growth of different strains of A.niger DS03043 and CBS 513.88. FIG. 1 is an overall flow chart of the multi-objective differential evolution algorithm of the invention.
1. Irreversible modification of model
The standard iJB1325 model contains 2320 responses, 1818 metabolites, 1325 genes, which include partially reversible responses. The existence of a reversible reaction in the model results in a negative lower bound for this part of the reversible reaction and negative values are generated in the iterative process. We therefore modified the iJB1325 model containing reversible reactions, splitting all reversible reactions into two reactions, forward and reverse, and transforming the model into an irreversible form. This ensures that the lower bound of all reactions in the model is zero, and also ensures that all reaction fluxes are in a non-negative range, in preparation for the next iteration. The converted irreversible iJB1325 model has 3030 reactions, so that the decision vector of the optimization problem has 3030 dimensions.
2. Determining a plurality of optimized objective functions
According to the general rule of microbial growth, the following can be obtained: the growth of cells requires energy to power them, and energy substance molecules produced in the cells include ATP, NADPH, NADH, and the like. A plurality of paths for producing ATP and other substances exist in the cells, the generation of the energy substances provides a basis for the growth of the cells, the energy substances are fuel for the growth of the cells, and the energy consumed in the growth process of the cells is obtained from the ATP and other substances. During the growth of microorganisms, the general laws of some organisms are satisfied: organisms will always tend to use less energy to meet the greater growth rate. That is, when conditions such as substrate concentration are the same, the microorganism will preferentially select a less energy consuming pathway to complete the growth process. The reactions in microbial systems to produce energy substances are very complex and other substances in metabolic processes can also provide certain energy for growth processes, but ATP, NADPH and NADH are the most common energy substances involved in metabolic processes, and therefore the optimization goal is to minimize the flux of these three metabolites produced in the cell.
Thus, the optimization objective function may be defined as
f=cTv
Wherein v is the reaction flux; c is the weight vector of the values in {0,1 }. Searching iJB1325 model after irreversible condition for flux satisfying the condition, found 24 reactions producing ATP in the cell; the total number of NADPH-producing reactions was 41; the number of reactions to NADH was 76. Then our optimization problem can be expressed as follows:
Figure BDA0003108030110000081
Figure BDA0003108030110000082
3. obtaining Aspergillus niger growth condition data and performing traditional FBA analysis
The growth conditions of Aspergillus niger are shown in Table 1, where we refer to the growth conditions of DS03043 as condition one and CBS513.88 as condition two. We have the specific growth rate μ as the optimization target and the remaining six as the input operating conditions. The conditions as input are presented in the form of constraints in the solving process, i.e. the upper and lower bounds of the limiting flux, such as the flux lower bound of 2.91 and the upper bound of 2.97 corresponding to the limiting O2 in the simulation of DS 03043.
TABLE 1 Aspergillus niger growth conditions data
Figure BDA0003108030110000083
To obtain the simulation accuracy of the model on both strains, the iJB1325 model was subjected to the maximum specific growth rate calculation of a general FBA running simulation. From the results of the simulation, the maximum specific growth rate simulation of iJB1325 versus DS03043 gave 0.1697 with an error of 21.21%, while the simulation of the model versus CBS513.88 gave 0.1088 with an error of 8.8%.
4. Iterative computation using multi-objective optimization algorithm
(1) Initializing a population: iJB1325 is subjected to single-target solution to obtain the maximum value of the model specific growth rate simulation, and then the upper bound of the specific growth rate is randomly changed within the range of the maximum value, so that the flux simulation results under different specific growth rate conditions can be obtained. The result of these flux simulations is then a feasible solution within the feasible region in the solution space, which we can use as the initial population to start our optimization algorithm.
(2) Mutation operation
And (4) carrying out mutation operation on individuals in the population. The specific variation strategy selects the most widely used DE/rand/1:
vi=xr1+F×(xr2-xr3)
wherein xr1、xr2、xr3Are three feasible solutions in the starting population, and xr1≠xr2≠xr3,F∈[0,1]Also here it can be seen that when x isr1、xr2、xr3When all are feasible solutions, namely satisfy
Figure BDA0003108030110000091
Then at this time there is
Svi=S(xr1+F×(xr2-xr3))=Sxr1+F×(Sxr2-Sxr3)=0
Namely, mutation operation enables the mutation-generating individuals to theoretically satisfy the constraint requirement, and the mutation strategy can lead the algorithm to search the optimal solution set in the theoretical feasible region all the time. This is also one of the advantages of using differential evolution algorithms, which can learn from populations to obtain useful information.
(3) And judging whether the solution generated by the mutation is in the solution space. Due to the large number of equality constraints and inequality constraints in the optimization problem and the high-dimensional decision vector, the overall solution space is very large and the feasible solution area is unpredictable. Then when performing mutation operations on individuals in a population, the addition of differential evolution operators may cause the individuals to exceed a specified solution space, or to be in solution space but to turn into infeasible solutions. It is therefore necessary to first determine for an individual whether it is in the solution space, i.e. whether there is an upper or lower bound violation of the inequality constraint. If such a solution exists, the decision vector is changed, which violates the inequality constraint, giving a random number in its upper and lower bounds instead:
vi=q×(ui-li)+li,q∈[0,1]
(4) and judging whether the individual is in the feasible region. After ensuring that the individual is in the solution space, since the solutions outside the feasible region do not help our iterative process, we need to make a judgment on the feasibility of the individual generated by the variation. If the individual is judged to be infeasible, a new solution needs to be generated through the mutation process again. Here, the basis for determining whether it is feasible is to calculate the absolute error resulting from the constraint of the equation, i.e., the sum of the absolute values of b in Sv ≦ b, when ≦ 10 ≦ b-10When we consider that an individual can be judged as an individualAnd (5) feasible individuals. When all the individuals become feasible, the next operation can be carried out.
(5) And (6) selecting operation. The parent population and the trial population are first mixed and their crowding distance and non-dominated ranking is calculated for each individual in the mixed population. The offspring populations are selected according to individual crowding distance and non-dominated ranking. See fig. 2 for a specific selection.
(6) And finally, if the maximum iteration number is not reached, substituting the child population into the step (3) for iterative calculation. And if the maximum iteration times are reached, ending the algorithm, and outputting the final population result as a Pareto optimal solution set of the multi-objective optimization problem. Since the method is a four-target optimization problem, the theoretical Pareto frontier is a four-dimensional hyperplane and is difficult to visualize. Thus, only the iterative results of the sum of the specific growth rate and the ATP-generating flux are shown in fig. 3 and 4 as constituting the Pareto frontier, and points within the tolerance of error in the aspergillus niger specific growth rate data are marked with yellow points.
5. Calculating central metabolism data of Aspergillus niger by using optimal solution obtained by iteration
It is to be noted that the production of energy substances in the central metabolism of A.niger is not the production of all energy substances in A.niger cells. We show that the optimized objective function is the sum of the reaction fluxes of all the energy substances in the cell that can produce the corresponding energy substances, and that the central metabolic energy substance production is not completely our objective function. We then calculated central metabolic flux data and energy substance production of central metabolic processes, centered at the yellow dots in fig. 3 and 4.
The invention discloses a microbial metabolism network model multi-optimization target determination method and application thereof. The method is based on a genome scale metabolic network model of microorganisms, a plurality of optimization targets are defined according to a general rule of microorganism growth, a constraint condition is determined by utilizing flux balance analysis, an optimization problem is constructed and solved, and a main structure of the solving method adopts a multi-target differential evolution method. Firstly, defining an objective function, wherein according to the basic rule of biological growth, the objective function is respectively defined as: maximum specific growth rate, minimum ATP production, minimum NADH production, and minimum NADPH production. And then generating an initial population meeting the constraint condition by using a general single-target linear programming method. And then, iteration is carried out according to the steps of the adjusted differential evolution algorithm, and a Pareto optimal solution set meeting the optimization target and the constraint condition can be obtained after the iteration is finished. And finally, analyzing the Pareto optimal solution set to complete the solution of the genome metabolic network model. The method can be applied to prediction of key flux in central metabolism.
The method can increase the solving of the microbial network metabolic model by considering biological information in the cell growth process, improves the reliability of the solving result compared with the common FBA containing more information, and more reasonably simulates partial key flux in central metabolism, thereby further guiding experimental biology subjects such as metabolic engineering and the like.

Claims (9)

1. The method for determining the multiple optimization targets of the microbial metabolism network model comprises the following steps:
step 1, setting an optimization target, and determining a target function to be optimized according to the optimization target;
step 2, according to a microbial metabolic network model in steady state balance, establishing an equality constraint Sv-b, and establishing and solving a single-target linear programming problem by taking the maximum specific growth rate as a single target function;
step 3, forming the solving result of the single-target linear programming into an initial population of the multi-target differential evolution algorithm, setting relevant parameters of the differential evolution algorithm, starting iterative computation, and outputting a Pareto optimal set meeting the conditions after a certain number of iterations is reached;
step 4, selecting proper individuals in the Pareto optimal set to obtain central metabolic flux data and central metabolic energy substance generation data of the individuals in the Pareto optimal set; as a result of a model of the microbial metabolic network.
2. The method for determining multiple optimization targets of a microbial metabolism network model according to claim 1, wherein the objective function in step 1 first determines the optimization targets of the problem as: the specific growth rate is kept to be maximum under the current condition, and meanwhile, the energy generated in the cells in the growth process is minimized; determining an objective function to be optimized according to the optimization objective, wherein the objective function is expressed as:
f=cTv
wherein v is a vector formed by all reaction fluxes in the metabolic network, c is a weight vector valued in {0,1}, and the dimensionality is the same as v; all objective functions are determined in this form; the objective function is set as the sum of all reaction fluxes meeting the optimization objective;
the optimization problem can be expressed in the form:
Figure FDA0003108030100000011
3. the method for determining multiple optimization targets of a microbial metabolic network model according to claim 1, wherein the steady-state metabolic network model single-target solution in step 2 is a metabolic network model solution method based on flux balance analysis, and the solution result of the single-target linear programming constitutes an initial population of a multi-target differential evolution algorithm, and the method comprises the following steps:
(1) at steady state of the metabolic system, the production and consumption of the individual compounds reach equilibrium, i.e. the concentrations of the individual metabolites do not change any over time:
Figure FDA0003108030100000021
wherein x is the concentration of the metabolite.
Based on this assumption, the equality constraints of the model of the microbial metabolic network at steady state equilibrium can be established:
Sv=0
(2) each reaction in the metabolic network model has upper and lower bound limits, specifying the solution space of the model in the form of inequality constraints;
(3) with specific growth rate μmax as the objective function:
the linear programming problem that forms a single target:
maxμ
Figure FDA0003108030100000022
wherein μ is the specific growth rate; s is a stoichiometric matrix of a microbial metabolic network; v is a vector formed by all reaction fluxes in the metabolic network; u and l are vectors formed by the upper limit and the lower limit of the reaction flux.
4. The method for determining multiple optimization targets of the microbial metabolism network model according to claim 1, wherein the multi-target differential evolution algorithm in the step 3 is a heuristic search algorithm based on a population, and adopts maximum iteration number MaxGen, population size NP and a scaling factor F; and forming an initial population by using the solving results of the single-target linear programming, and obtaining a Pareto optimal solution set of the multi-target problem according to a Pareto dominant relationship and non-dominant sequencing after iterative computation is performed by using a differential evolution algorithm.
5. The method for determining multiple optimization objectives of a microbial metabolism network model according to claim 4, wherein the Pareto optimal solution set and its related definitions can be described, for a multiple objective optimization problem:
min[f1(x1,x2,L,xD),f2(x1,x2,L,xD),L,fn(x1,x2,L,xD)]
Figure FDA0003108030100000023
wherein f is1,f2,L,fnFor n objective functions to be optimized;
Figure FDA0003108030100000024
Are respectively a variable xjLower and upper bounds of (1);
design the individual
Figure FDA0003108030100000025
Are two sets of feasible solutions to the problem, the objective function values corresponding to these two sets of individuals are respectively f1 1,
Figure FDA0003108030100000026
L,
Figure FDA0003108030100000027
f1 2,
Figure FDA0003108030100000028
L,
Figure FDA0003108030100000029
If they satisfy both of the following conditions:
Figure FDA0003108030100000031
Figure FDA0003108030100000032
then call the individual x1Dominating individual x2(ii) a The curved surface formed in the solution space by all non-dominant individuals in the solution set is called PARETO front edge; the individual set on the front edge of PARETO is called a PARETO optimal solution set;
in order to obtain a relatively complete Pareto frontier, individuals in the optimal solution set need to have certain spatial distribution; introducing a crowding degree operator to represent the distribution condition of individuals in the population, wherein the smaller the crowding degree operator is, the tighter the distribution of the individuals is; the crowdedness is a standard for measuring the distance and the density between the solution concentration individuals; the method for calculating the crowding degree comprises the following steps:
Figure FDA0003108030100000033
wherein, I [ I ]]M represents the crowdedness of the ith individual on the mth objective function, f [ i + 1]]M denotes the mth objective function value of the (i + 1) th individual,
Figure FDA0003108030100000034
respectively the maximum and minimum values of the mth objective function.
6. The method for determining multiple optimization goals of a microbial metabolic network model according to claim 5, wherein each individual in the current population is assumed to have two attributes:
(1) non-dominant ranking level irank
(2) Crowding distance idistance
Thus definition <nHas the following properties:
if(irank<jrank)
or
((irank=jrank)and(idistance>jdistance))
then i<nj
between two solutions with different non-dominant ranks, the solution with the lower rank is preferentially selected, i.e., the solution that performs better. If the two solutions have the same non-dominant rank, the solution with the larger crowding distance is selected.
7. The method for determining the multiple optimization targets of the microbial metabolism network model according to claim 5, wherein the overall process of the multi-target differential evolution algorithm is as follows:
(1) initialization: obtaining a plurality of single target solution results, and constructing an initial population;
(2) setting control parameters of a differential evolution algorithm: maximum iteration times MaxGen, population size NP and a scaling factor F;
(3) mutation operation: according to the general mutation strategy of differential evolution, new individuals are generated for individual mutation in the parent population:
vi=xr1+F×(xr2-xr3)
wherein r1, r2 and r3 are random numbers selected from [1, NP ], r1 ≠ r2 ≠ r3, F is a control parameter for controlling the size of the differential vector, and the random number of [0,1] is generally selected;
(4) pareto governs ranking: combining the parent population and the test population into a synthetic population, calculating an objective function value of the synthetic population, and grading each individual in the synthetic population according to a Pareto governing principle; the lower the Pareto rating, the more excellent the individual is;
(5) and (3) calculating the crowding degree: calculating the crowding distance of each individual to represent the distribution condition of the individuals in the population, wherein the smaller the crowding distance is, the tighter the distribution of the individuals is; too tight a distribution of solutions will be detrimental to obtaining a complete Pareto frontier;
(6) generating a progeny population: selecting the offspring population according to two evaluation operators of Pareto grade and crowding degree for individuals in the synthetic population; firstly, carrying out hierarchical arrangement from low to high according to Pareto sorting grades, putting the whole layer into a child population from a population with a low grade until the size of individuals on a certain layer exceeds the population size after the individuals on the certain layer are put into the child population, namely, the individuals cannot be all put into the child population; arranging all the individuals which cannot be put into the offspring population from large to small according to the crowding degree, and preferentially putting the individuals with the large crowding degree into the offspring population until the offspring population is filled up;
(7) and (5) substituting the filial generation population obtained in the step (6) into the step (3), repeating the steps (3) to (6) to carry out iterative calculation until the maximum iteration number MaxGen is reached, and outputting a final iteration result.
8. The method for determining the multiple optimization targets of the microbial metabolism network model according to claim 1, wherein the step 4 is used for obtaining central metabolism flux data and central metabolism energy substance generation data of individuals in Pareto optimal concentration; the central metabolic pathway is the result of the phosphotransferase system, the glycolysis, the pentose phosphate pathway and the tricarboxylic acid cycle pathway of the glyoxylate shunt as a model of the microbial metabolic network.
9. The microbial metabolism network model multiple optimization objective determination method according to claim 1 is applied to prediction of key flux in central metabolism.
CN202110641532.3A 2021-06-09 2021-06-09 Multi-optimization target determining method for microbial metabolism network model and application thereof Active CN113470732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110641532.3A CN113470732B (en) 2021-06-09 2021-06-09 Multi-optimization target determining method for microbial metabolism network model and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110641532.3A CN113470732B (en) 2021-06-09 2021-06-09 Multi-optimization target determining method for microbial metabolism network model and application thereof

Publications (2)

Publication Number Publication Date
CN113470732A true CN113470732A (en) 2021-10-01
CN113470732B CN113470732B (en) 2024-04-05

Family

ID=77869421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110641532.3A Active CN113470732B (en) 2021-06-09 2021-06-09 Multi-optimization target determining method for microbial metabolism network model and application thereof

Country Status (1)

Country Link
CN (1) CN113470732B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115662498A (en) * 2022-12-29 2023-01-31 天津大学 Biological metabolic pathway design method based on improved multi-objective evolutionary algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020168654A1 (en) * 2001-01-10 2002-11-14 Maranas Costas D. Method and system for modeling cellular metabolism
CN104778513A (en) * 2015-04-13 2015-07-15 哈尔滨工程大学 Multi-population evolution method for constrained multi-objective optimization
CN112446533A (en) * 2020-09-29 2021-03-05 东北电力大学 Multi-target planning method for AC/DC hybrid power distribution network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020168654A1 (en) * 2001-01-10 2002-11-14 Maranas Costas D. Method and system for modeling cellular metabolism
CN104778513A (en) * 2015-04-13 2015-07-15 哈尔滨工程大学 Multi-population evolution method for constrained multi-objective optimization
CN112446533A (en) * 2020-09-29 2021-03-05 东北电力大学 Multi-target planning method for AC/DC hybrid power distribution network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王建林;吴佳欢;张超然;于涛;赵利强;: "基于自适应进化多目标约束的青霉素发酵过程优化", 仪器仪表学报, no. 12 *
陈久生;郑浩然;娄慧;: "一个基于同位素标记实验的代谢通量估计高效率算法", 北京生物医学工程, no. 04 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115662498A (en) * 2022-12-29 2023-01-31 天津大学 Biological metabolic pathway design method based on improved multi-objective evolutionary algorithm

Also Published As

Publication number Publication date
CN113470732B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
Liu et al. A multimodal multiobjective evolutionary algorithm using two-archive and recombination strategies
Thengade et al. Genetic algorithm–survey paper
Kikuchi et al. Dynamic modeling of genetic networks using genetic algorithm and S-system
CA2439260C (en) Models and methods for determining systemic properties of regulated reaction networks
Al-Janabi et al. A novel optimization algorithm (Lion-AYAD) to find optimal DNA protein synthesis
Khan et al. Sampling CAD models via an extended teaching–learning-based optimization technique
Yu et al. Multi-objective scheduling in hybrid flow shop: Evolutionary algorithms using multi-decoding framework
CN108491686B (en) Bidirectional XGboost-based gene regulation and control network construction method
JP2023544067A (en) Basal medium development method and system
CN113470732B (en) Multi-optimization target determining method for microbial metabolism network model and application thereof
Zhou et al. CCEO: cultural cognitive evolution optimization algorithm
Mütter et al. Artificial intelligence for solid oxide fuel cells: Combining automated high accuracy artificial neural network model generation and genetic algorithm for time-efficient performance prediction and optimization
Fühner et al. Use of genetic algorithms for the development and optimization of crystal growth processes
Liu et al. An efficient manta ray foraging optimization algorithm with individual information interaction and fractional derivative mutation for solving complex function extremum and engineering design problems
Jin et al. Soft sensor modeling for small data scenarios based on data enhancement and selective ensemble
Calçada et al. Comparison of GA and PSO performance in parameter estimation of microbial growth models: A case-study using experimental data
Muniglia et al. Multicriteria optimization of a single-cell oil production
CN115600492A (en) Laser cutting process design method and system
Van Riel et al. Dynamic optimal control of homeostasis: an integrative system approach for modeling of the central nitrogen metabolism in Saccharomyces cerevisiae
Smith et al. Comparing Prochlorococcus temperature niches in the lab and across ocean basins
Lu et al. Multi-objective optimization for improving machining benefit based on WOA-BBPN and a Deep Double Q-Network
Bromig et al. Understanding biochemical design principles with ensembles of canonical non-linear models
Haupt Introduction to genetic algorithms
Schardong et al. Water Resources Research Report
Sun et al. Multi-strategy synthetized equilibrium optimizer and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant