CN112750495A - Prediction method for biological metabolic pathway based on neural network - Google Patents

Prediction method for biological metabolic pathway based on neural network Download PDF

Info

Publication number
CN112750495A
CN112750495A CN202110104847.4A CN202110104847A CN112750495A CN 112750495 A CN112750495 A CN 112750495A CN 202110104847 A CN202110104847 A CN 202110104847A CN 112750495 A CN112750495 A CN 112750495A
Authority
CN
China
Prior art keywords
model
data
neural network
predicting
metabolic pathway
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110104847.4A
Other languages
Chinese (zh)
Inventor
王小华
陈亮
张娜
韩锋
王美娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Sanyouhe Smart Information Technology Co Ltd
Original Assignee
Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Sanyouhe Smart Information Technology Co Ltd filed Critical Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority to CN202110104847.4A priority Critical patent/CN112750495A/en
Publication of CN112750495A publication Critical patent/CN112750495A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Computing Systems (AREA)
  • Physiology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of biodynamic path prediction, and particularly relates to a method for predicting a biometabolic path based on a neural network, which comprises the following steps: data downloading: acquiring a data set, and completing construction of the data set required by model training; data preprocessing: interpolation is carried out on the data set, more time sequence points are added by the Savitzky-Golay method, normalization is carried out simultaneously, data division is used for data expansion, and the accuracy rate of model identification is improved; identifying a model; and (5) saving the model. The invention only needs the concentration of the protein and the metabolite passing through the metabolic pathway, does not need to consider the data of temperature, flux and the like, and greatly reduces the process of data preprocessing compared with the dynamic model, and considers the relationship between the protein. The method reduces the difficulty of model establishment, and meanwhile, compared with a dynamic model, the accuracy of the model established based on the neural network is obviously improved. The invention is used for predicting the biological metabolic pathway.

Description

Prediction method for biological metabolic pathway based on neural network
Technical Field
The invention belongs to the field of biodynamic path prediction, and particularly relates to a method for predicting a biometabolic path based on a neural network.
Background
At present, in biological engineering, biological metabolic pathways are quite complex, and proteins and metabolites in metabolic pathway processes are often difficult to control, so that the generation of final products is difficult to predict. Before different protein concentrations, there are positive and negative effects, and how to distinguish them will also improve the efficiency of the whole bioengineering.
Through the dynamic model, a model such as a metering model can be constructed, and a biological dynamic path can be well predicted, but the establishment based on the dynamic model is increasingly complex along with the increase of biological engineering, needs professional knowledge to construct, and takes time and energy sharply increased.
Disclosure of Invention
Aiming at the technical problems that the existing biological metabolic pathway needs professional knowledge to be constructed and the time and energy are increased sharply, the invention provides the method for predicting the biological metabolic pathway based on the neural network, which has the advantages of low cost, high accuracy and strong stability.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for predicting a biological metabolic pathway based on a neural network comprises the following steps:
s1, data downloading: acquiring a data set, and completing construction of the data set required by model training;
s2, preprocessing data: interpolating the data set, adding more time sequence points by adopting a Savitzky-Golay method, simultaneously normalizing, and dividing data for data expansion so as to improve the identification accuracy of the model;
s3, identifying the model;
s4, model storage: and when the loss function of the model is not reduced any more and the evaluation index reaches the optimum and tends to be stable, saving the model.
The method for completing construction of the data set required by model training in the step S1 includes: the data set comprises 3 strains for producing escherichia coli, high-yield strains and low-yield strains are used as training data, medium-yield strains are used for verifying the phenotype of the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.
The data preprocessing method in the step S2 includes: comprises the following steps:
s2.1, carrying out 0 mean value normalization on data
Figure BDA0002917395720000011
Mu is the mean value of the original data, and sigma is the standard deviation of the original data;
and S2.2, dividing the training set into a training set and a verification set, adopting cross verification, taking the concentration of each metabolite as a characteristic value, and taking the concentration time derivative of the metabolite as a target value.
The method for identifying the model in S3 includes: comprises the following steps:
s3.1, model construction: constructing a non-linear function through a neural network
Figure BDA0002917395720000021
Wherein m (t) time series metabolite concentration, p (t) time series protein concentration, wherein
Figure BDA0002917395720000022
A metabolite timing derivative; by using gradient descent, the cost function is reduced, and the optimization criterion is reached:
Figure BDA0002917395720000023
generating cross-validation data prevents overfitting of data by solving
Figure BDA0002917395720000024
Differential equation, predicting biological motionA state path;
s3.2, evaluating the performance of the time series model: using simulated data, a random strain is selected from the data set, and for each time series, the agreement between predicted and test data is evaluated by calculating the root mean square error of the predicted trajectory
Figure BDA0002917395720000025
RMSE is the root mean square error of the predicted trajectory,
Figure BDA0002917395720000026
is an interpolation of the actual metabolite concentration of metabolite j at time t, mj(t) is the prediction from the solution.
In the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function of the model does not descend any more.
Compared with the prior art, the invention has the following beneficial effects:
the invention uses the multi-layer hidden layer and PCA mechanism data characteristic selection to reduce the dimension of the multidimensional data in the dynamic path prediction process. The invention only needs the concentration of the protein and the metabolite passing through the metabolic pathway, does not need to consider the data of temperature, flux and the like, and greatly reduces the process of data preprocessing compared with the dynamic model, and considers the relationship between the protein. The method reduces the difficulty of model establishment, and meanwhile, compared with a dynamic model, the accuracy of the model established based on the neural network is obviously improved.
Drawings
FIG. 1 is a flow chart of the main steps of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A method for predicting biological metabolic pathways based on neural networks, as shown in FIG. 1, comprises the following steps:
s1, data downloading: acquiring a data set, and completing construction of the data set required by model training;
s2, preprocessing data: interpolating the data set, adding more time sequence points by adopting a Savitzky-Golay method, simultaneously normalizing, and dividing data for data expansion so as to improve the identification accuracy of the model;
s3, identifying the model;
s4, model storage: and when the loss function of the model is not reduced any more and the evaluation index reaches the optimum and tends to be stable, saving the model.
Further, the method for completing the construction of the data set required by the model training in S1 includes: to complete the construction of the data set required for model training, the data set used herein contains 3 E.coli-producing strains, the specific sources of which were mentioned above, in order to verify the model identification performance. To validate the phenotype of the model, the data set will use high, low yielding strains as training data and medium yielding strains for validation. In order to better predict the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.
Further, the data preprocessing method in S2 includes: comprises the following steps:
s2.1, carrying out 0 mean value normalization on data
Figure BDA0002917395720000031
Mu is the mean value of the original data, and sigma is the standard deviation of the original data;
s2.2, in order to improve the model prediction degree, the training set is divided into a training set and a verification set, cross verification is adopted, the concentration of each metabolite is also used as a characteristic value, and the concentration time derivative of the metabolite is used as a target value.
Further, the method for identifying the model in S3 is as follows: comprises the following steps:
s3.1, model construction: construction of non-linearities through neural networksFunction(s)
Figure BDA0002917395720000032
Wherein m (t) time series metabolite concentration, p (t) time series protein concentration, wherein
Figure BDA0002917395720000033
A metabolite timing derivative; by using gradient descent, the cost function is reduced, and the optimization criterion is reached:
Figure BDA0002917395720000034
generating cross-validation data prevents overfitting of data by solving
Figure BDA0002917395720000035
Differential equations, predicting biodynamic paths;
s3.2, evaluating the performance of the time series model: using simulated data, a random strain is selected from the data set, and for each time series, the agreement between predicted and test data is evaluated by calculating the root mean square error of the predicted trajectory
Figure BDA0002917395720000036
RMSE is the root mean square error of the predicted trajectory,
Figure BDA0002917395720000037
is an interpolation of the actual metabolite concentration of metabolite j at time t, mj(t) is the prediction from the solution.
Further, in the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function when the model does not descend any more.
Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.

Claims (5)

1. A method for predicting a biological metabolic pathway based on a neural network is characterized by comprising the following steps: comprises the following steps:
s1, data downloading: acquiring a data set, and completing construction of the data set required by model training;
s2, preprocessing data: interpolating the data set, adding more time sequence points by adopting a Savitzky-Golay method, simultaneously normalizing, and dividing data for data expansion so as to improve the identification accuracy of the model;
s3, identifying the model;
s4, model storage: and when the loss function of the model is not reduced any more and the evaluation index reaches the optimum and tends to be stable, saving the model.
2. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the method for completing construction of the data set required by model training in the step S1 includes: the data set comprises 3 strains for producing escherichia coli, high-yield strains and low-yield strains are used as training data, medium-yield strains are used for verifying the phenotype of the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.
3. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the data preprocessing method in the step S2 includes: comprises the following steps:
s2.1, carrying out 0 mean value normalization on data
Figure FDA0002917395710000011
Mu is the mean value of the original data, and sigma is the standard deviation of the original data;
and S2.2, dividing the training set into a training set and a verification set, adopting cross verification, taking the concentration of each metabolite as a characteristic value, and taking the concentration time derivative of the metabolite as a target value.
4. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the method for identifying the model in S3 includes: comprises the following steps:
s3.1, model construction: constructing a non-linear function through a neural network
Figure FDA0002917395710000017
Wherein m (t) time series metabolite concentration, p (t) time series protein concentration, wherein
Figure FDA0002917395710000012
A metabolite timing derivative; by using gradient descent, the cost function is reduced, and the optimization criterion is reached:
Figure FDA0002917395710000013
generating cross-validation data prevents overfitting of data by solving
Figure FDA0002917395710000014
Differential equations, predicting biodynamic paths;
s3.2, evaluating the performance of the time series model: using simulated data, a random strain is selected from the data set, and for each time series, the agreement between predicted and test data is evaluated by calculating the root mean square error of the predicted trajectory
Figure FDA0002917395710000015
RMSE is the root mean square error of the predicted trajectory,
Figure FDA0002917395710000016
is an interpolation of the actual metabolite concentration of metabolite j at time t, mj(t) is the prediction from the solution.
5. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: in the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function of the model does not descend any more.
CN202110104847.4A 2021-01-26 2021-01-26 Prediction method for biological metabolic pathway based on neural network Pending CN112750495A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110104847.4A CN112750495A (en) 2021-01-26 2021-01-26 Prediction method for biological metabolic pathway based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110104847.4A CN112750495A (en) 2021-01-26 2021-01-26 Prediction method for biological metabolic pathway based on neural network

Publications (1)

Publication Number Publication Date
CN112750495A true CN112750495A (en) 2021-05-04

Family

ID=75653166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110104847.4A Pending CN112750495A (en) 2021-01-26 2021-01-26 Prediction method for biological metabolic pathway based on neural network

Country Status (1)

Country Link
CN (1) CN112750495A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190005187A1 (en) * 2017-06-28 2019-01-03 The Regents Of The University Of California Simulating the metabolic pathway dynamics of an organism
CN111128307A (en) * 2019-12-14 2020-05-08 中国科学院深圳先进技术研究院 Metabolic path prediction method and device, terminal device and readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190005187A1 (en) * 2017-06-28 2019-01-03 The Regents Of The University Of California Simulating the metabolic pathway dynamics of an organism
CN111128307A (en) * 2019-12-14 2020-05-08 中国科学院深圳先进技术研究院 Metabolic path prediction method and device, terminal device and readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MAYANK BARANWAL等: "A deep learning architecture for metabolic pathway prediction", 《BIOINFORMATICS》 *
ZAK COSTELLO等: "A machine learning approach to predict metabolic pathwaydynamics from time-series multiomics data", 《NPJ SYSTEMS BIOLOGY AND APPLICATIONS》 *
陈修来等: "微生物辅因子平衡的代谢调控", 《生物工程学报》 *

Similar Documents

Publication Publication Date Title
US20210295100A1 (en) Data processing method and apparatus, electronic device, and storage medium
CN108446794A (en) One kind being based on multiple convolutional neural networks combination framework deep learning prediction techniques
WO2021109644A1 (en) Hybrid vehicle working condition prediction method based on meta-learning
CN113052334A (en) Method and system for realizing federated learning, terminal equipment and readable storage medium
CN111158237B (en) Industrial furnace temperature multi-step prediction control method based on neural network
CN111119282A (en) Pressure monitoring point optimal arrangement method for water supply pipe network
CN104539601B (en) Dynamic network attack process analysis method for reliability and system
CN110471820A (en) A kind of cloud storage system disk failure prediction technique based on Recognition with Recurrent Neural Network
Wu et al. Forecasting the carbon price sequence in the Hubei emissions exchange using a hybrid model based on ensemble empirical mode decomposition
CN109754122A (en) A kind of Numerical Predicting Method of the BP neural network based on random forest feature extraction
CN115378988A (en) Data access abnormity detection and control method and device based on knowledge graph
CN104616072A (en) Method for improving concentration of glutamic acid fermented product based on interval optimization
CN110807508B (en) Bus peak load prediction method considering complex weather influence
CN113435595B (en) Two-stage optimization method for network parameters of extreme learning machine based on natural evolution strategy
CN113673788A (en) Photovoltaic power generation power prediction method based on decomposition error correction and deep learning
CN112750495A (en) Prediction method for biological metabolic pathway based on neural network
CN103983332A (en) Method for error compensation of sensor based on HGSA-BP algorithm
CN112581311B (en) Method and system for predicting long-term output fluctuation characteristics of aggregated multiple wind power plants
CN114357867A (en) Primary frequency modulation control method and device based on water turbine simulation intelligent solution
Liu et al. Forecasting China’s per capita living energy consumption by employing a novel DGM (1, 1, tα) model with fractional order accumulation
CN107256425B (en) Random weight network generalization capability improvement method and device
CN111651887A (en) Method for analyzing uncertainty of parameters of numerical model
CN112488248A (en) Method for constructing proxy model based on convolutional neural network
Wang et al. A Parameter Estimation Method of Shock Model Constructed with Phase‐Type Distribution on the Condition of Interval Data
Wang et al. Efficient climate simulation via machine learning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210504

RJ01 Rejection of invention patent application after publication