CN112750495A - Prediction method for biological metabolic pathway based on neural network - Google Patents
Prediction method for biological metabolic pathway based on neural network Download PDFInfo
- Publication number
- CN112750495A CN112750495A CN202110104847.4A CN202110104847A CN112750495A CN 112750495 A CN112750495 A CN 112750495A CN 202110104847 A CN202110104847 A CN 202110104847A CN 112750495 A CN112750495 A CN 112750495A
- Authority
- CN
- China
- Prior art keywords
- model
- data
- neural network
- predicting
- metabolic pathway
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Computing Systems (AREA)
- Physiology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the field of biodynamic path prediction, and particularly relates to a method for predicting a biometabolic path based on a neural network, which comprises the following steps: data downloading: acquiring a data set, and completing construction of the data set required by model training; data preprocessing: interpolation is carried out on the data set, more time sequence points are added by the Savitzky-Golay method, normalization is carried out simultaneously, data division is used for data expansion, and the accuracy rate of model identification is improved; identifying a model; and (5) saving the model. The invention only needs the concentration of the protein and the metabolite passing through the metabolic pathway, does not need to consider the data of temperature, flux and the like, and greatly reduces the process of data preprocessing compared with the dynamic model, and considers the relationship between the protein. The method reduces the difficulty of model establishment, and meanwhile, compared with a dynamic model, the accuracy of the model established based on the neural network is obviously improved. The invention is used for predicting the biological metabolic pathway.
Description
Technical Field
The invention belongs to the field of biodynamic path prediction, and particularly relates to a method for predicting a biometabolic path based on a neural network.
Background
At present, in biological engineering, biological metabolic pathways are quite complex, and proteins and metabolites in metabolic pathway processes are often difficult to control, so that the generation of final products is difficult to predict. Before different protein concentrations, there are positive and negative effects, and how to distinguish them will also improve the efficiency of the whole bioengineering.
Through the dynamic model, a model such as a metering model can be constructed, and a biological dynamic path can be well predicted, but the establishment based on the dynamic model is increasingly complex along with the increase of biological engineering, needs professional knowledge to construct, and takes time and energy sharply increased.
Disclosure of Invention
Aiming at the technical problems that the existing biological metabolic pathway needs professional knowledge to be constructed and the time and energy are increased sharply, the invention provides the method for predicting the biological metabolic pathway based on the neural network, which has the advantages of low cost, high accuracy and strong stability.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for predicting a biological metabolic pathway based on a neural network comprises the following steps:
s1, data downloading: acquiring a data set, and completing construction of the data set required by model training;
s2, preprocessing data: interpolating the data set, adding more time sequence points by adopting a Savitzky-Golay method, simultaneously normalizing, and dividing data for data expansion so as to improve the identification accuracy of the model;
s3, identifying the model;
s4, model storage: and when the loss function of the model is not reduced any more and the evaluation index reaches the optimum and tends to be stable, saving the model.
The method for completing construction of the data set required by model training in the step S1 includes: the data set comprises 3 strains for producing escherichia coli, high-yield strains and low-yield strains are used as training data, medium-yield strains are used for verifying the phenotype of the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.
The data preprocessing method in the step S2 includes: comprises the following steps:
s2.1, carrying out 0 mean value normalization on dataMu is the mean value of the original data, and sigma is the standard deviation of the original data;
and S2.2, dividing the training set into a training set and a verification set, adopting cross verification, taking the concentration of each metabolite as a characteristic value, and taking the concentration time derivative of the metabolite as a target value.
The method for identifying the model in S3 includes: comprises the following steps:
s3.1, model construction: constructing a non-linear function through a neural networkWherein m (t) time series metabolite concentration, p (t) time series protein concentration, whereinA metabolite timing derivative; by using gradient descent, the cost function is reduced, and the optimization criterion is reached:generating cross-validation data prevents overfitting of data by solvingDifferential equation, predicting biological motionA state path;
s3.2, evaluating the performance of the time series model: using simulated data, a random strain is selected from the data set, and for each time series, the agreement between predicted and test data is evaluated by calculating the root mean square error of the predicted trajectoryRMSE is the root mean square error of the predicted trajectory,is an interpolation of the actual metabolite concentration of metabolite j at time t, mj(t) is the prediction from the solution.
In the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function of the model does not descend any more.
Compared with the prior art, the invention has the following beneficial effects:
the invention uses the multi-layer hidden layer and PCA mechanism data characteristic selection to reduce the dimension of the multidimensional data in the dynamic path prediction process. The invention only needs the concentration of the protein and the metabolite passing through the metabolic pathway, does not need to consider the data of temperature, flux and the like, and greatly reduces the process of data preprocessing compared with the dynamic model, and considers the relationship between the protein. The method reduces the difficulty of model establishment, and meanwhile, compared with a dynamic model, the accuracy of the model established based on the neural network is obviously improved.
Drawings
FIG. 1 is a flow chart of the main steps of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A method for predicting biological metabolic pathways based on neural networks, as shown in FIG. 1, comprises the following steps:
s1, data downloading: acquiring a data set, and completing construction of the data set required by model training;
s2, preprocessing data: interpolating the data set, adding more time sequence points by adopting a Savitzky-Golay method, simultaneously normalizing, and dividing data for data expansion so as to improve the identification accuracy of the model;
s3, identifying the model;
s4, model storage: and when the loss function of the model is not reduced any more and the evaluation index reaches the optimum and tends to be stable, saving the model.
Further, the method for completing the construction of the data set required by the model training in S1 includes: to complete the construction of the data set required for model training, the data set used herein contains 3 E.coli-producing strains, the specific sources of which were mentioned above, in order to verify the model identification performance. To validate the phenotype of the model, the data set will use high, low yielding strains as training data and medium yielding strains for validation. In order to better predict the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.
Further, the data preprocessing method in S2 includes: comprises the following steps:
s2.1, carrying out 0 mean value normalization on dataMu is the mean value of the original data, and sigma is the standard deviation of the original data;
s2.2, in order to improve the model prediction degree, the training set is divided into a training set and a verification set, cross verification is adopted, the concentration of each metabolite is also used as a characteristic value, and the concentration time derivative of the metabolite is used as a target value.
Further, the method for identifying the model in S3 is as follows: comprises the following steps:
s3.1, model construction: construction of non-linearities through neural networksFunction(s)Wherein m (t) time series metabolite concentration, p (t) time series protein concentration, whereinA metabolite timing derivative; by using gradient descent, the cost function is reduced, and the optimization criterion is reached:generating cross-validation data prevents overfitting of data by solvingDifferential equations, predicting biodynamic paths;
s3.2, evaluating the performance of the time series model: using simulated data, a random strain is selected from the data set, and for each time series, the agreement between predicted and test data is evaluated by calculating the root mean square error of the predicted trajectoryRMSE is the root mean square error of the predicted trajectory,is an interpolation of the actual metabolite concentration of metabolite j at time t, mj(t) is the prediction from the solution.
Further, in the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function when the model does not descend any more.
Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.
Claims (5)
1. A method for predicting a biological metabolic pathway based on a neural network is characterized by comprising the following steps: comprises the following steps:
s1, data downloading: acquiring a data set, and completing construction of the data set required by model training;
s2, preprocessing data: interpolating the data set, adding more time sequence points by adopting a Savitzky-Golay method, simultaneously normalizing, and dividing data for data expansion so as to improve the identification accuracy of the model;
s3, identifying the model;
s4, model storage: and when the loss function of the model is not reduced any more and the evaluation index reaches the optimum and tends to be stable, saving the model.
2. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the method for completing construction of the data set required by model training in the step S1 includes: the data set comprises 3 strains for producing escherichia coli, high-yield strains and low-yield strains are used as training data, medium-yield strains are used for verifying the phenotype of the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.
3. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the data preprocessing method in the step S2 includes: comprises the following steps:
s2.1, carrying out 0 mean value normalization on dataMu is the mean value of the original data, and sigma is the standard deviation of the original data;
and S2.2, dividing the training set into a training set and a verification set, adopting cross verification, taking the concentration of each metabolite as a characteristic value, and taking the concentration time derivative of the metabolite as a target value.
4. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the method for identifying the model in S3 includes: comprises the following steps:
s3.1, model construction: constructing a non-linear function through a neural networkWherein m (t) time series metabolite concentration, p (t) time series protein concentration, whereinA metabolite timing derivative; by using gradient descent, the cost function is reduced, and the optimization criterion is reached:generating cross-validation data prevents overfitting of data by solvingDifferential equations, predicting biodynamic paths;
s3.2, evaluating the performance of the time series model: using simulated data, a random strain is selected from the data set, and for each time series, the agreement between predicted and test data is evaluated by calculating the root mean square error of the predicted trajectoryRMSE is the root mean square error of the predicted trajectory,is an interpolation of the actual metabolite concentration of metabolite j at time t, mj(t) is the prediction from the solution.
5. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: in the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function of the model does not descend any more.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110104847.4A CN112750495A (en) | 2021-01-26 | 2021-01-26 | Prediction method for biological metabolic pathway based on neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110104847.4A CN112750495A (en) | 2021-01-26 | 2021-01-26 | Prediction method for biological metabolic pathway based on neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112750495A true CN112750495A (en) | 2021-05-04 |
Family
ID=75653166
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110104847.4A Pending CN112750495A (en) | 2021-01-26 | 2021-01-26 | Prediction method for biological metabolic pathway based on neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112750495A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190005187A1 (en) * | 2017-06-28 | 2019-01-03 | The Regents Of The University Of California | Simulating the metabolic pathway dynamics of an organism |
CN111128307A (en) * | 2019-12-14 | 2020-05-08 | 中国科学院深圳先进技术研究院 | Metabolic path prediction method and device, terminal device and readable storage medium |
-
2021
- 2021-01-26 CN CN202110104847.4A patent/CN112750495A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190005187A1 (en) * | 2017-06-28 | 2019-01-03 | The Regents Of The University Of California | Simulating the metabolic pathway dynamics of an organism |
CN111128307A (en) * | 2019-12-14 | 2020-05-08 | 中国科学院深圳先进技术研究院 | Metabolic path prediction method and device, terminal device and readable storage medium |
Non-Patent Citations (3)
Title |
---|
MAYANK BARANWAL等: "A deep learning architecture for metabolic pathway prediction", 《BIOINFORMATICS》 * |
ZAK COSTELLO等: "A machine learning approach to predict metabolic pathwaydynamics from time-series multiomics data", 《NPJ SYSTEMS BIOLOGY AND APPLICATIONS》 * |
陈修来等: "微生物辅因子平衡的代谢调控", 《生物工程学报》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210295100A1 (en) | Data processing method and apparatus, electronic device, and storage medium | |
CN108446794A (en) | One kind being based on multiple convolutional neural networks combination framework deep learning prediction techniques | |
WO2021109644A1 (en) | Hybrid vehicle working condition prediction method based on meta-learning | |
CN113052334A (en) | Method and system for realizing federated learning, terminal equipment and readable storage medium | |
CN111158237B (en) | Industrial furnace temperature multi-step prediction control method based on neural network | |
CN111119282A (en) | Pressure monitoring point optimal arrangement method for water supply pipe network | |
CN104539601B (en) | Dynamic network attack process analysis method for reliability and system | |
CN110471820A (en) | A kind of cloud storage system disk failure prediction technique based on Recognition with Recurrent Neural Network | |
Wu et al. | Forecasting the carbon price sequence in the Hubei emissions exchange using a hybrid model based on ensemble empirical mode decomposition | |
CN109754122A (en) | A kind of Numerical Predicting Method of the BP neural network based on random forest feature extraction | |
CN115378988A (en) | Data access abnormity detection and control method and device based on knowledge graph | |
CN104616072A (en) | Method for improving concentration of glutamic acid fermented product based on interval optimization | |
CN110807508B (en) | Bus peak load prediction method considering complex weather influence | |
CN113435595B (en) | Two-stage optimization method for network parameters of extreme learning machine based on natural evolution strategy | |
CN113673788A (en) | Photovoltaic power generation power prediction method based on decomposition error correction and deep learning | |
CN112750495A (en) | Prediction method for biological metabolic pathway based on neural network | |
CN103983332A (en) | Method for error compensation of sensor based on HGSA-BP algorithm | |
CN112581311B (en) | Method and system for predicting long-term output fluctuation characteristics of aggregated multiple wind power plants | |
CN114357867A (en) | Primary frequency modulation control method and device based on water turbine simulation intelligent solution | |
Liu et al. | Forecasting China’s per capita living energy consumption by employing a novel DGM (1, 1, tα) model with fractional order accumulation | |
CN107256425B (en) | Random weight network generalization capability improvement method and device | |
CN111651887A (en) | Method for analyzing uncertainty of parameters of numerical model | |
CN112488248A (en) | Method for constructing proxy model based on convolutional neural network | |
Wang et al. | A Parameter Estimation Method of Shock Model Constructed with Phase‐Type Distribution on the Condition of Interval Data | |
Wang et al. | Efficient climate simulation via machine learning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210504 |
|
RJ01 | Rejection of invention patent application after publication |