CN112750495A

CN112750495A - Prediction method for biological metabolic pathway based on neural network

Info

Publication number: CN112750495A
Application number: CN202110104847.4A
Authority: CN
Inventors: 王小华; 陈亮; 张娜; 韩锋; 王美娟
Original assignee: Shanxi Sanyouhe Smart Information Technology Co Ltd
Current assignee: Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-05-04

Abstract

The invention belongs to the field of biodynamic path prediction, and particularly relates to a method for predicting a biometabolic path based on a neural network, which comprises the following steps: data downloading: acquiring a data set, and completing construction of the data set required by model training; data preprocessing: interpolation is carried out on the data set, more time sequence points are added by the Savitzky-Golay method, normalization is carried out simultaneously, data division is used for data expansion, and the accuracy rate of model identification is improved; identifying a model; and (5) saving the model. The invention only needs the concentration of the protein and the metabolite passing through the metabolic pathway, does not need to consider the data of temperature, flux and the like, and greatly reduces the process of data preprocessing compared with the dynamic model, and considers the relationship between the protein. The method reduces the difficulty of model establishment, and meanwhile, compared with a dynamic model, the accuracy of the model established based on the neural network is obviously improved. The invention is used for predicting the biological metabolic pathway.

Description

Prediction method for biological metabolic pathway based on neural network

Technical Field

The invention belongs to the field of biodynamic path prediction, and particularly relates to a method for predicting a biometabolic path based on a neural network.

Background

At present, in biological engineering, biological metabolic pathways are quite complex, and proteins and metabolites in metabolic pathway processes are often difficult to control, so that the generation of final products is difficult to predict. Before different protein concentrations, there are positive and negative effects, and how to distinguish them will also improve the efficiency of the whole bioengineering.

Through the dynamic model, a model such as a metering model can be constructed, and a biological dynamic path can be well predicted, but the establishment based on the dynamic model is increasingly complex along with the increase of biological engineering, needs professional knowledge to construct, and takes time and energy sharply increased.

Disclosure of Invention

Aiming at the technical problems that the existing biological metabolic pathway needs professional knowledge to be constructed and the time and energy are increased sharply, the invention provides the method for predicting the biological metabolic pathway based on the neural network, which has the advantages of low cost, high accuracy and strong stability.

In order to solve the technical problems, the invention adopts the technical scheme that:

a method for predicting a biological metabolic pathway based on a neural network comprises the following steps:

s1, data downloading: acquiring a data set, and completing construction of the data set required by model training;

s2, preprocessing data: interpolating the data set, adding more time sequence points by adopting a Savitzky-Golay method, simultaneously normalizing, and dividing data for data expansion so as to improve the identification accuracy of the model;

s3, identifying the model;

s4, model storage: and when the loss function of the model is not reduced any more and the evaluation index reaches the optimum and tends to be stable, saving the model.

The method for completing construction of the data set required by model training in the step S1 includes: the data set comprises 3 strains for producing escherichia coli, high-yield strains and low-yield strains are used as training data, medium-yield strains are used for verifying the phenotype of the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.

The data preprocessing method in the step S2 includes: comprises the following steps:

s2.1, carrying out 0 mean value normalization on data

Mu is the mean value of the original data, and sigma is the standard deviation of the original data;

and S2.2, dividing the training set into a training set and a verification set, adopting cross verification, taking the concentration of each metabolite as a characteristic value, and taking the concentration time derivative of the metabolite as a target value.

The method for identifying the model in S3 includes: comprises the following steps:

s3.1, model construction: constructing a non-linear function through a neural network

Wherein m (t) time series metabolite concentration, p (t) time series protein concentration, wherein

A metabolite timing derivative; by using gradient descent, the cost function is reduced, and the optimization criterion is reached:

generating cross-validation data prevents overfitting of data by solving

Differential equation, predicting biological motionA state path;

s3.2, evaluating the performance of the time series model: using simulated data, a random strain is selected from the data set, and for each time series, the agreement between predicted and test data is evaluated by calculating the root mean square error of the predicted trajectory

RMSE is the root mean square error of the predicted trajectory,

is an interpolation of the actual metabolite concentration of metabolite j at time t, m_j(t) is the prediction from the solution.

In the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function of the model does not descend any more.

Compared with the prior art, the invention has the following beneficial effects:

the invention uses the multi-layer hidden layer and PCA mechanism data characteristic selection to reduce the dimension of the multidimensional data in the dynamic path prediction process. The invention only needs the concentration of the protein and the metabolite passing through the metabolic pathway, does not need to consider the data of temperature, flux and the like, and greatly reduces the process of data preprocessing compared with the dynamic model, and considers the relationship between the protein. The method reduces the difficulty of model establishment, and meanwhile, compared with a dynamic model, the accuracy of the model established based on the neural network is obviously improved.

Drawings

FIG. 1 is a flow chart of the main steps of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A method for predicting biological metabolic pathways based on neural networks, as shown in FIG. 1, comprises the following steps:

s3, identifying the model;

Further, the method for completing the construction of the data set required by the model training in S1 includes: to complete the construction of the data set required for model training, the data set used herein contains 3 E.coli-producing strains, the specific sources of which were mentioned above, in order to verify the model identification performance. To validate the phenotype of the model, the data set will use high, low yielding strains as training data and medium yielding strains for validation. In order to better predict the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.

Further, the data preprocessing method in S2 includes: comprises the following steps:

s2.1, carrying out 0 mean value normalization on data

s2.2, in order to improve the model prediction degree, the training set is divided into a training set and a verification set, cross verification is adopted, the concentration of each metabolite is also used as a characteristic value, and the concentration time derivative of the metabolite is used as a target value.

Further, the method for identifying the model in S3 is as follows: comprises the following steps:

s3.1, model construction: construction of non-linearities through neural networksFunction(s)

generating cross-validation data prevents overfitting of data by solving

Differential equations, predicting biodynamic paths;

RMSE is the root mean square error of the predicted trajectory,

Further, in the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function when the model does not descend any more.

Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.

Claims

1. A method for predicting a biological metabolic pathway based on a neural network is characterized by comprising the following steps: comprises the following steps:

s3, identifying the model;

2. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the method for completing construction of the data set required by model training in the step S1 includes: the data set comprises 3 strains for producing escherichia coli, high-yield strains and low-yield strains are used as training data, medium-yield strains are used for verifying the phenotype of the model, a large number of strains are generated through a dynamic model, and then more remarkable prediction is carried out.

3. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the data preprocessing method in the step S2 includes: comprises the following steps:

s2.1, carrying out 0 mean value normalization on data

4. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: the method for identifying the model in S3 includes: comprises the following steps:

generating cross-validation data prevents overfitting of data by solving

Differential equations, predicting biodynamic paths;

RMSE is the root mean square error of the predicted trajectory,

5. The method for predicting the biological metabolic pathway based on the neural network as claimed in claim 1, wherein: in the model saving in S4, forward and backward propagation is used by gradient descent so that the loss function of the model does not descend any more.