CN116030904A

CN116030904A - Method and system for forecasting concentration of biological enzyme in microbial fermentation process

Info

Publication number: CN116030904A
Application number: CN202310086258.7A
Authority: CN
Inventors: 王浩; 袁景淇; 孙鑫宇
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2023-01-18
Filing date: 2023-01-18
Publication date: 2023-04-28

Abstract

The invention provides a method and a system for forecasting biological enzyme concentration in a microbial fermentation process, wherein the method comprises the following steps: acquiring online data of historical tank batches and current tank batches to be forecasted, and offline data of the tank batches corresponding to the online data; carrying out statistical analysis and designing classification standards according to the offline data, and classifying the historical tank batches; respectively establishing a mixed forecast training database for the online data and the offline data according to the classified tank batch types; based on the mixed forecasting training database, training each type of tank batch data by using an artificial neural network forecasting model and an extreme gradient lifting forecasting model respectively; based on an extreme gradient lifting algorithm, nonlinear fusion is carried out on the trained artificial neural network forecasting model and the extreme gradient lifting forecasting model, so that mixed forecasting is realized; every time one tank batch fermentation is finished, rolling update is carried out on the mixed forecast training database. The method is used for advanced prediction of the concentration of the biological enzyme in the microbial fermentation process, and can realize higher-precision prediction.

Description

Method and system for forecasting concentration of biological enzyme in microbial fermentation process

Technical Field

The invention relates to the field of biotechnology, in particular to a method and a system for forecasting concentration of biological enzymes in a microbial fermentation process.

Background

In the actual microbial fermentation production process, it is important to accurately and timely forecast some key state variables (such as thallus concentration, substrate concentration, biological enzyme concentration, etc.). Some fermentation state variables are difficult to measure on line, related data can be obtained by sampling a fermentation system and performing off-line assay analysis, but sampling operation can bring bacteria contamination risk to the fermentation system, and time delay generated by off-line sampling analysis is unfavorable for the real-time production operation of the fermentation process. Although relevant sensors are available, these sensors tend to be quite expensive and less reliable and are not suitable for large scale industrial processes. Thus, the technology for forecasting the state variables of the microbial fermentation process is attracting attention of researchers. By utilizing a computer technology, a large amount of discrete measurement data of the fermentation process can be obtained, and the data-driven process modeling can provide the functions of state prediction, fault diagnosis and the like of the fermentation process, so that the method is one of hot spot directions of fermentation process research.

The patent application number is 201310661816.4, the application date is 2013-12-09, and describes an online prediction method for the biological fermentation yield based on a Bayesian combined neural network. According to the method, batches to be predicted are not classified, the output values of the three neural networks obtained through training are not all applicable to the batches, for example, when the batches to be predicted are dominant batches, the neural network predictors corresponding to the dominant batches can be used for better predicting the yield of the batches, but the neural network predictors corresponding to the medium batches and the inferior batches have lower reliability of the prediction results of the yield of the batches due to no dominant features in training data.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method and a system for forecasting the concentration of biological enzymes in a microbial fermentation process.

According to one aspect of the present invention, there is provided a method for predicting the concentration of a biological enzyme in a microbial fermentation process, the method comprising:

acquiring online data of historical tank batches and current tank batches to be forecasted, and offline data of the tank batches corresponding to the online data;

carrying out statistical analysis and designing classification standards according to the offline data, and classifying historical tank batches;

respectively establishing a mixed forecast training database for the online data and the offline data according to the classified tank batch types;

based on the mixed forecasting training database, training each type of tank batch data by using an artificial neural network forecasting model and an extreme gradient lifting forecasting model respectively;

based on an extreme gradient lifting algorithm, nonlinear fusion is carried out on the trained artificial neural network forecasting model and the extreme gradient lifting forecasting model, so that mixed forecasting is realized;

and rolling and updating the mixed forecast training database every time one tank batch fermentation is finished.

Further, the online data of the historical tank lot and the current tank lot to be forecasted are obtained, and the online data corresponds to the offline data of the tank lot, wherein: the online data comprise fermentation liquor temperature T, fermentation liquor pH, fermentation liquor volume V, fermentation tank pressure P, ventilation volume flow F, dissolved oxygen DO and stirring paddle rotating speed omega; the offline data includes the biological enzyme concentration recorded offline.

Further, the statistical analysis is performed according to the offline data and a classification standard is designed, wherein: according to the classification function J _c,i Classifying the value of (t) to calculate a classification function J _c,i The formula for the value of (t) is:

wherein J is _c,i (T) represents the classification function value of the ith tank lot at the time T, T _W Represents the window width that is considered to achieve classification, J _i (t) represents the biological enzyme concentration of the ith tank lot at time t.

Still further, the classifying the historical tank lot includes: definition J _c,ave (t) is the mean value of the classification function at the time t, sigma (t) is the standard deviation of the classification function at the time t, alpha is the confidence coefficient, and the classification standard is designed as follows:

J _c,i (t)＜J _c,ave (t) - α.σ (t), a low-yield tank lot;

J _c,ave (t)-α·σ(t)≤J _c,i (t)≤J _c,ave (t) +α·σ (t) is the average tank lot;

J _c,i (t)＞J _c,ave (t) +α.σ (t) is a high yield tank lot.

Further, the creating a mixed forecast training database for the online data and the offline data, respectively, wherein: the data of the mixed forecast training database are uniformly distributed, and the data sets are derived from production data in the same period of time.

Further, the tank lot data of each class are trained by an artificial neural network forecasting model and an extreme gradient lifting forecasting model respectively, wherein: the artificial neural network prediction model is based on the collected state data of the fermentation production process, a characteristic model between input data and output data is established, and the actual fermentation process is simulated.

Further, the tank lot data of each class are trained by an artificial neural network forecasting model and an extreme gradient lifting forecasting model respectively, wherein: the input variables of the extreme gradient lifting prediction model comprise detectable variables affecting the concentration of the biological enzyme, the output variables comprise the concentration of the biological enzyme which is predicted in advance, and new classification trees are continuously added to the current model in the training process of the extreme gradient lifting prediction model so as to improve the model prediction precision.

Further, the method for performing nonlinear fusion on the trained artificial neural network prediction model and the extreme gradient lifting prediction model based on the extreme gradient lifting algorithm comprises the following steps:

with the prediction result of a single model as input, i.e. x _f ＝[U _A (t _u +c),U _X (t _u +c)]In the formula, U _A (t _u +c) represents the forecasting result of the artificial neural network forecasting model, U _X (t _u +c) represents the prediction result of the extreme gradient lifting prediction model, and c represents the advanced prediction time; the actual biological enzyme concentration is taken as output, namely y _f ＝[U(t _u +c)]And optimizing the weight of the single model in the mixed forecasting model by utilizing the self-learning capability of the model.

Further, each time a tank batch fermentation is finished, rolling update is performed on the mixed forecast training database, including:

if the tank lot is an abnormal tank lot, maintaining the selected tank lot in the mixed forecast training database as it is;

if the biological enzyme concentration curve in the middle and later stages of the tank lot is similar to that of an earlier tank lot in the confidence domain, replacing the earlier tank lot with the tank lot;

if the biological enzyme concentration curve in the middle and later stages of the tank lot is not similar to any historical tank lot confidence domain, the biological enzyme concentration curve is directly added into the mixed forecast training database.

According to another aspect of the present invention, there is provided a system for predicting the concentration of a biological enzyme in a microbial fermentation process, the system comprising:

the acquisition module is used for: acquiring online data of historical tank batches and current tank batches to be forecasted, and offline data of the tank batches corresponding to the online data;

and a classification module: carrying out statistical analysis and designing classification standards according to the offline data, and classifying historical tank batches;

and a database building module: respectively establishing a mixed forecast training database for the online data and the offline data according to the classified tank batch types;

training a test module: based on the mixed forecasting training database, training each type of tank batch data by using an artificial neural network forecasting model and an extreme gradient lifting forecasting model respectively;

and the mixing forecasting module is used for: based on an extreme gradient lifting algorithm, nonlinear fusion is carried out on the trained artificial neural network forecasting model and the extreme gradient lifting forecasting model, so that mixed forecasting is realized;

a rolling update module: and rolling and updating the mixed forecast training database every time one tank batch fermentation is finished.

Compared with the prior art, the invention has at least one of the following beneficial effects:

1. according to the method, the category of the batch to be predicted is determined, then model prediction of an artificial neural network and extreme gradient lifting is performed respectively, and then mixed prediction of the two models is performed based on an extreme gradient lifting algorithm, so that the weight of a single model in the mixed prediction model can be optimized by the self-learning capability of the model, and the biological enzyme concentration prediction with higher precision can be realized.

2. The invention does not need to add any measuring point or other equipment, and only needs to add a software calculation module in the existing control system, thereby realizing lower cost. The method can be applied to fermentation production sites, and has good application potential in guiding on-line monitoring and scheduling optimization.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a method for predicting the concentration of biological enzymes in a microbial fermentation process according to an embodiment of the invention;

FIG. 2 is a 272 tank lot classification result in an embodiment of the invention;

FIG. 3 is a schematic diagram of an artificial neural network predictor according to an embodiment of the present invention;

FIG. 4 is a training flow diagram of an extreme gradient lifting forecast model according to an embodiment of the invention;

FIG. 5 is a graph showing average relative error distribution of 232 average enzyme activity tank lot advanced mix forecast for 1h in an embodiment of the invention;

FIG. 6 is a graph showing the average relative error distribution of a 232 average enzyme activity tank lot advanced mix forecast for 2h in an embodiment of the invention;

FIG. 7 shows the average relative error distribution of the 232 average enzyme activity tank lot advanced mix forecast for 3h according to an embodiment of the invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.

The embodiment of the invention provides a method for forecasting biological enzyme concentration in a microbial fermentation process, and referring to fig. 1, the method comprises the following steps:

step one: acquiring online data of historical tank batches and current tank batches to be forecasted and offline data of the tank batches corresponding to the online data, wherein the online data comprises the following steps of: the historical tank lot is a tank lot with complete data after fermentation, the current tank lot to be forecasted is a tank lot for fermentation, and the online data corresponds to the tank lot and is a tank lot with the historical tank lot and the current data part of the current tank lot to be forecasted;

step two: carrying out statistical analysis and designing classification standards according to the offline data, and classifying the historical tank batches;

step three: respectively establishing a mixed forecast training database for the online data and the offline data according to the classified tank batch types;

step four: based on the mixed forecasting training database, training each type of tank batch data by using an artificial neural network forecasting model and an extreme gradient lifting forecasting model respectively;

step five: based on an extreme gradient lifting algorithm, carrying out nonlinear fusion on the trained artificial neural network forecasting model and the extreme gradient lifting forecasting model to obtain a mixed forecasting model, and realizing mixed forecasting by using the model;

step six: every time one tank batch fermentation is finished, rolling update is carried out on the mixed forecast training database.

In some embodiments, in step one, historical lot online data for a given lot number and lot and known online data for a current lot to be forecasted are read from a factory distributed control system database, and offline data for the corresponding lot is collected for the read online data, wherein: the online data comprise fermentation liquor temperature T, fermentation liquor pH, fermentation liquor volume V, fermentation tank pressure P, ventilation volume flow F, dissolved oxygen DO and stirring paddle rotating speed omega; the offline data includes the biological enzyme concentration recorded offline.

In some embodiments, in step two, a statistical analysis is performed from the offline data and a classification criterion is designed, wherein: according to the classification function J _c,i Classifying the value of (t) to calculate a classification function J _c,i The formula for the value of (t) is:

Further, classifying the tank lot includes: definition J _c,ave (t) is the mean value of the classification function at the time t, sigma (t) is the standard deviation of the classification function at the time t, alpha is the confidence coefficient, and the classification standard is designed as follows: j (J) _c,i (t)＜J _c,ave (t) - α.σ (t), a low-yield tank lot; j (J) _c,ave (t)-α·σ(t)≤J _c,i (t)≤J _c,ave (t) +α·σ (t) is the average tank lot; j (J) _c,i (t)＞J _c,ave (t) +α.σ (t) is a high yield tank lot. Thus, the tank lot is divided into three categories.

In some embodiments, in the third step, the collected online and offline data are classified into three categories according to the designed classification standard, and training databases are respectively built according to the classified tank batch categories, wherein: the data of the mixed forecast training database comprises online data and offline data, wherein the online data is used as the input of model training, and the offline data is used as the output of model training. The building of the mixed forecast training database should follow the following principles: the size of the database should be moderate, the data of the database should be evenly distributed, the data should represent the working condition of normal tank lot, the data should be intensively sourced from the production data within the same period of time, such as 1-3 months, so as to exclude some gradual factors as much as possible.

In some embodiments, in step four, the training and testing of the artificial neural network forecasting model is based on an artificial neural network algorithm. The nature of the artificial neural network algorithm is that an artificial neuron simulates a nonlinear system, and theoretically has the capability of approximating an arbitrary function and various derivatives thereof. The fermentation process state variable is predicted by adopting an artificial neural network prediction model, and an accurate fermentation process physical mathematical model is not needed, namely, prior knowledge of the process is not needed, so that the tank batch data of each type are trained by adopting the artificial neural network prediction model and an extreme gradient lifting prediction model respectively, wherein: the artificial neural network prediction model is based on the collected state data of the fermentation production process, a characteristic model between input data and output data is established, the actual fermentation process is simulated, and the production operation is guided. When modeling of a fermentation process and online prediction of state variables are carried out, state variable values at a certain moment in the future are predicted by utilizing some data which can be detected online in real time and offline analysis data, and the artificial neural network prediction model serves as a model approaching to a real physical fermentation process in the prediction process, and parameters such as internal neuron connection weights and the like are the specific expression of the model. The input variables of the neural network state variable predictor generally comprise production variables which are easy to realize online real-time detection, such as fermentation liquor pH, temperature, dissolved oxygen, oxygen concentration in tail gas, carbon dioxide concentration and the like, and the output variables can be biological variables which are difficult to online real-time detection, such as thallus concentration, biological enzyme concentration, substrate concentration and the like.

In some embodiments, in step four, the training and testing of the extreme gradient boost forecasting model is based on an extreme gradient boost algorithm. The extreme gradient lifting algorithm belongs to Boosting algorithm, is efficient realization of gradient lifting decision tree, has the advantages of accuracy and good interpretation in forecasting problem, and is suitable for efficient parallel calculation. And training the tank batch data of each type by using an artificial neural network forecasting model and an extreme gradient lifting forecasting model, wherein: the input variables of the extreme gradient lifting forecast model comprise detectable variables affecting the concentration of the biological enzyme, and the output variables comprise the concentration of the biological enzyme forecast in advance. In the training process of the extreme gradient lifting forecasting model, new classification trees are continuously added to the current model so as to improve model forecasting accuracy.

Specifically, online detection data is selected as the input data set a _input And collecting corresponding offline assay data as output data set A _output . Input data set A _input Mainly comprises detectable variables affecting the synthesis of the product, such as fermentation liquor temperature T, fermentation liquor pH, fermentation liquor volume V, fermentation tank pressure P, ventilation volume flow F, dissolved oxygen DO, stirring paddle rotating speed omega and the like, and outputs a data set A _output Primarily refers to the biological enzyme concentration recorded off-line.

The input data of the training set is recorded as x _j (j=1, 2,., N), training set output data is y _j (j=1, 2,., N), the prediction result obtained using the extreme gradient lifting algorithm is

The method comprises the following steps:

where k is the number of the tree, M is the total number of the trees in the algorithm, f _k (x _j ) Is x _j Weights at the kth tree.

The objective function of the extreme gradient lifting forecast model can be expressed as:

wherein N is the number of samples,

as a training error function of the jth sample, Ω (f _k ) Is a regularization function of the kth tree. The objective function is minimized, i.e., the error function is smaller than the regularization function.

In the training process of the extreme gradient lifting forecasting model, new classification trees are continuously added to the current model, and then model forecasting accuracy is improved. In the case where a model for forecasting the composition of m-1 trees is obtained, when the mth tree is added to the model, there are:

in the method, in the process of the invention,

sample x for the first m-1 tree pairs _j F is the predicted result of (f) _m (x _j ) Is x _j Weight at the mth tree.

By using Taylor second-order expansion, the objective function Obj is optimized, and the following can be obtained:

/>

wherein p is _j 、q _j The first and second partial derivatives of the training error function, respectively.

Expansion omega (f) _m ) Definition phi _l Delta as a dataset belonging to a leaf node l of a decision tree _l Is the weight of the leaf node l. For a tree of fixed structure, the value of the objective function is only equal to delta _l Related, therefore, to delta _l Deriving, wherein the optimal weight of each leaf node and the optimal value of the objective function are respectively as follows:

wherein T is _L λ and γ are weighting factors for the total number of leaf nodes.

After the structure of the decision tree is determined, the weights of the leaf nodes may be determined. The result of the new addition tree is obtained by an exact greedy algorithm, i.e. each split can minimize the objective function until the objective function has no room to decrease or reaches a preset maximum depth.

In the fifth step, in order to improve the prediction accuracy, the two models are mixed and predicted based on the extreme gradient lifting algorithm in consideration of the complex nonlinear relationship between the two models and the actual biological enzyme concentration. Specifically, based on an extreme gradient lifting algorithm, nonlinear fusion is performed on a trained artificial neural network prediction model and an extreme gradient lifting prediction model, and the method comprises the following steps: with the prediction result of a single model as input, i.e. x _f ＝[U _A (t _u +c),U _X (t _u +c)]In the formula, U _A (t _u +c) represents the forecasting result of the artificial neural network forecasting model, U _X (t _u +c) represents the prediction result of the extreme gradient lifting prediction model, and c represents the advanced prediction time; the actual biological enzyme concentration is taken as output, namely y _f ＝[U(t _u +c)]The weight of a single model in the mixed forecasting model is optimized by utilizing the self-learning capability of the model, so that the forecasting of the concentration of the biological enzyme with higher precision is realized.

In some embodiments, in the sixth step, since the fermentation tank related detection device is worn to some extent over time, and in addition, the change of climate may also have some influence on the fermentation production, the training database of the prediction model needs to be updated in a rolling way. Specifically, each time a tank batch fermentation is completed, rolling update is performed on the mixed forecast training database, including:

if the tank lot is an abnormal tank lot, namely a tank lot with bacteria or a tank lot with the biological enzyme concentration obviously lower than the lowest biological enzyme concentration curve in the historical tank lot, the tank lot cannot be added into a database formed by the tank lot under normal working conditions, but is discarded, and the tank lot selected by the mixed forecast training database is maintained as is;

if the biological enzyme concentration curve in the middle and later stages of the tank lot is similar to that of an earlier tank lot in a confidence domain, the middle and later stages of the tank lot refer to the middle and later stages of the whole fermentation process of the tank lot, the earlier tank lot refers to a historical tank lot in a training database, and the data of the two tank lots indicate that the effect of model training is similar, so that the two tank lots can be replaced, newer data can be selected, and the earlier tank lot is replaced by the tank lot;

if the biological enzyme concentration curve in the middle and later stages of the tank lot is not similar to any historical tank lot confidence domain, but the tank lot is not an abnormal tank lot needing to be discarded and can be used for enriching the diversity of the database, the biological enzyme concentration curve is directly added into the mixed forecast training database.

In the embodiment, the data in the database accords with the operation condition of the last 1-3 months through rolling update, so that the device loss and the influence of climate change can be reduced as much as possible, and the model forecasting precision is improved.

The invention is described in more detail below in connection with examples of enzyme activity prediction for products produced by industrial fermentation of xylanases.

The xylanase is mainly applied to brewing and feed industries, can decompose cell walls and beta-glucan serving as raw materials in the brewing or feed industries, reduces the viscosity of materials in the brewing, promotes the release of effective substances, reduces non-starch polysaccharide in feed grains, and promotes the absorption and utilization of nutrient substances. Xylanase is a secondary metabolite of streptomyces and can be obtained by intermittent fermentation of streptomyces. The flow chart of the method for forecasting the concentration of biological enzymes in the microbial fermentation process in this embodiment is shown in fig. 1, and specifically includes the following steps:

s1, reading historical tank lot online data of given tank numbers and batches and known online data of the tank lot to be forecasted currently from a factory distributed control system database, and collecting offline data of the tank lot corresponding to the read online data.The online data mainly comprise fermentation liquor temperature T, fermentation liquor pH, fermentation liquor volume V, fermentation tank pressure P, ventilation volume flow F, dissolved oxygen DO, stirring paddle rotating speed omega and the like, and the offline data refer to product enzyme activities recorded offline (in the specific embodiment, the biological enzyme concentration is characterized by the product enzyme activities). The online data are correspondingly regulated and controlled along with the fermentation production, for example, 80m ³ The xylanase fermenter was generally controlled at about 34℃and the rotational speed was maintained at 200rpm during the middle and late stages.

S2, carrying out statistical analysis according to the offline data acquired in the S1, and calculating a classification function J by adopting the following formula _c,i Value of (t):

wherein J is _c,i (T) represents the classification function value of the ith tank lot at the time T, T _W Represents the window width that is considered to achieve classification, J _i (t) shows the product enzyme activity of the ith tank lot at the time t.

Definition J _c,ave (t) is the mean value of the classification function at the time t, sigma (t) is the standard deviation of the classification function at the time t, alpha is the confidence coefficient, and the classification standard is designed as shown in table 1:

TABLE 1 tank batch classification criteria

Conditions (conditions)	Category(s)
		J _c,i (t)＜J _c,ave (t)-α·σ(t)	Low-enzyme activity tank batch
J _c,ave (t)-α·σ(t)≤J _c,i (t)≤J _c,ave (t)+α·σ(t)	Average enzyme activity tank lot
		J _c,i (t)＞J _c,ave (t)+α·σ(t)	High enzyme activity tank batch

Alpha is generally between 1.04 and 1.65, corresponding to a confidence limit of 85% to 95%, respectively, where the value 1.28 corresponds to a confidence limit of 90%. For example, the classification results of 272 tank lots of a xylanase production plant are shown in FIG. 2, wherein the fermentation culture time and the enzyme activity values are normalized.

S3, dividing the collected online data and offline data into three types according to the classification standard designed in the S2, and respectively establishing a training database. In addition, the training database should be built following several principles: (a) the size of the database should be moderate; (b) the data of the database should be evenly distributed; (c) the data of the database is used for representing the working condition of normal tank batches; (d) The data of the database should be concentrated from the production data of the same period of time (1-3 months), excluding as many gradient factors as possible. Specifically, the training database contained online and offline data from 272 tank lots total within 3 months of a certain xylanase production plant.

And S4, performing model training and testing on the tank batch data of each type by using an artificial neural network algorithm.

Referring to fig. 3, the schematic diagram of the artificial neural network predictor is that, when modeling of the fermentation process and online prediction of state variables are performed, the state variable values at a certain moment in the future are predicted by using some data which can be detected online in real time and offline analysis data, the artificial neural network predictor model serves as a model approaching to the actual physical fermentation process in the prediction process, and parameters such as the connection weight of internal neurons are the specific expression of the model. The input variables of the neural network state variable predictor comprise detectable variables which influence the enzymatic activity of the product through inspection, in particular the current time t _u Current enzyme activity U (t) _u )、The discrete fermentation liquor pH (t) from the current moment to the previous 8h and the discrete dissolved oxygen DO (t) from the current moment to the previous 8h, namely the model input is x _u ＝[t _u ,U(t _u ),pH(t),DO(t)] ^T ,t＝t _u -8,t _u -7,...,t _u The output variable specifically refers to the enzyme activities of the products advanced by 1h, 2h and 3h respectively, namely the model output is y _u ＝[U(t _u +1),U(t _u +2),U(t _u +3)] ^T . Taking 232 batches of average enzyme activity tank batches as an example, performing traversal training and testing, sequentially selecting 231 batches of data as a training set and 1 batch of data as a testing set, and completing parameter tuning to obtain a trained artificial neural network forecasting model.

S5, carrying out model training and testing on the tank batch data of each type by using an extreme gradient lifting algorithm.

The input variables include detectable variables which are checked to affect the enzymatic activity of the product, in particular the current time t _u Current enzyme activity U (t) _u ) The pH (t) of the discrete fermentation liquor from the current moment to the previous 8h, and the discrete dissolved oxygen DO (t) from the current moment to the previous 8h, namely, the model input is x _u ＝[t _u ,U(t _u ),pH(t),DO(t)] ^T ,t＝t _u -8,t _u -7,...,t _u The output variable specifically refers to the enzyme activities of the products advanced by 1h, 2h and 3h respectively, namely the model output is y _u ＝[U(t _u +1),U(t _u +2),U(t _u +3)] ^T 。

The training flow of the extreme gradient lifting prediction model is shown in fig. 4, and the input data of the training set is recorded as x _j (j=1, 2,., N), training set output data is y _j (j=1, 2,., N), the prediction result obtained using the extreme gradient lifting algorithm is

The method comprises the following steps:

wherein N is the number of samples,

in the method, in the process of the invention,

Expansion omega (f) _m ) Definition phi _l Delta as a dataset belonging to a leaf node l of a decision tree _l Is the weight of the leaf node l. Opposite fixing structureFor the tree of (a), the value of the objective function is only delta _l Related, therefore, to delta _l Deriving, wherein the optimal weight of each leaf node and the optimal value of the objective function are respectively as follows:

/>

After the structure of the decision tree is determined, the weights of the leaf nodes may be determined. The result of the new addition tree is obtained by an exact greedy algorithm, i.e. each split can minimize the objective function until the objective function has no room to decrease or reaches a preset maximum depth. Taking 232 batches of average enzyme activity tank batches as an example, performing traversal training and testing, sequentially selecting 231 batches of data as a training set and 1 batch of data as a testing set, and completing parameter tuning to obtain a trained extreme gradient lifting prediction model.

S6, in order to improve forecasting accuracy, taking complex nonlinear relations between the two models in S4 and S5 and the actual product enzyme activity into consideration, and carrying out mixed forecasting of the two models based on an extreme gradient lifting algorithm.

With the prediction result of a single model as input, i.e. x _f ＝[U _A (t _u +c),U _X (t _u +c)]In the formula, U _A (t _u +c) represents the enzyme activity prediction result of the artificial neural network prediction model, U _X (t _u +c) represents the prediction result of the enzyme activity of the extreme gradient lifting prediction model, c represents the advanced prediction time, c=1, 2,3. The actual product enzyme activity is taken as output, namely y _f ＝[U(t _u +c)]The weight of a single model in a mixed forecasting model is optimized by utilizing the self-learning capability of the model, so that the forecasting of the enzyme activity of the product with higher precision is realized. Taking 232 batches of average enzyme activity tank batches as an example, carrying out the processAnd (3) performing training and testing, sequentially selecting 231 batches of data as a training set, and 1 batch of data as a testing set, and completing parameter tuning to obtain a trained hybrid forecasting model (namely an enzyme activity hybrid forecasting model) based on extreme gradient lifting.

To evaluate and verify the forecasting effect, define the relative error:

in U _p (r) represents the product enzyme activity forecast value at the (r) th sampling time of a single tank batch, U _m (r) represents the actual value of the product enzyme activity at the r-th sampling time of a single tank lot.

To estimate the accuracy of the model, an average relative error is introduced:

wherein R represents the number of enzyme activity sampling points of a single tank batch. The average relative error distribution of 232 average enzyme activity tank batches for respectively advanced mixing forecasting 1h, 2h and 3h is shown in fig. 5-7, and the average value of e for the 232 batch advanced forecasting 1h, 2h and 3h is respectively 1.12%, 1.71% and 2.13% after calculation, which indicates that the mixing forecasting model has higher accuracy and robustness.

S7, as the related detection device of the fermentation tank can generate a certain degree of loss along with the time, and in addition, the change of climate can also cause a certain influence on fermentation production, so that the training database of the mixed forecasting model needs to be updated in a rolling way. Every time one tank batch fermentation is finished, the training database is updated in a rolling way: (a) If the tank lot is an abnormal tank lot, maintaining the tank lot selected by the database as it is; (b) Replacing an earlier tank lot with the tank lot if the later-stage product enzyme activity curve in the tank lot is similar to that of the earlier tank lot in the confidence domain; (c) If the enzyme activity curve of the product in the middle and later stages of the tank lot is not similar to any historical tank lot confidence domain, the product is directly added into a training database.

According to the embodiment of the invention, the types of batches to be forecasted are determined, then model forecasting of an artificial neural network and extreme gradient lifting is performed respectively, and then mixed forecasting of the two models is performed based on an extreme gradient lifting algorithm, so that the weight of a single model in the mixed forecasting model can be optimized by the self-learning capacity of the model, and higher-precision product concentration forecasting can be realized. The verification result shows that the embodiment of the invention does not need to additionally add measuring points or other devices, and only needs to additionally add a software calculation module in the existing control system, so that the realization cost is lower. The embodiment of the invention can be applied to a fermentation production site, and has good application potential in guiding on-line monitoring and scheduling optimization.

Based on the same inventive concept as described above, another embodiment of the present invention provides a system for predicting a concentration of a biological enzyme in a microbial fermentation process, the system comprising:

and a classification module: carrying out statistical analysis and designing classification standards according to the offline data, and classifying the historical tank batches;

and the mixing forecasting module is used for: based on an extreme gradient lifting algorithm, nonlinear fusion is carried out on an artificial neural network forecasting model and an extreme gradient lifting forecasting model, so that mixed forecasting is realized;

a rolling update module: every time one tank batch fermentation is finished, rolling update is carried out on the mixed forecast training database.

The specific implementation technology of each module in the embodiment of the system for forecasting the concentration of the biological enzyme in the microbial fermentation process can adopt the technical characteristics corresponding to each step in the method for forecasting the concentration of the biological enzyme in the microbial fermentation process, and the description is omitted here.

Based on the same inventive concept, another embodiment of the present invention also provides an electronic terminal, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the program to perform the method for predicting the biological enzyme concentration in the microbial fermentation process in the above embodiment.

In the above embodiment, the memory is used for storing the program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment. Reference may be made in particular to the description of the embodiments of the method described above.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

Based on the same inventive concept, another embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program for performing the method of predicting the concentration of biological enzymes in a microbial fermentation process in the above-described embodiment when the program is executed by a processor.

Among them, computer-readable media include computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a user device. The processor and the storage medium may reside as discrete components in a communication device.

According to the embodiment of the invention, the mixed forecasting training databases are respectively established according to the tank batch types, the types of batches to be forecasted are determined, then model forecasting of an artificial neural network and extreme gradient lifting is respectively carried out, then mixed forecasting of two models is carried out based on an extreme gradient lifting algorithm, and the weight of a single model in the mixed forecasting model is optimized by utilizing the self-learning capacity of the model, so that higher-precision forecasting of the product concentration can be realized.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention. The above-described preferred features may be used in any combination without collision.

Claims

1. A method for predicting the concentration of a biological enzyme in a microbial fermentation process, comprising:

2. The method for predicting the concentration of biological enzymes in a microbial fermentation process according to claim 1, wherein the online data of the historical tank lot and the current tank lot to be predicted are obtained, and the online data correspond to the offline data of the tank lot, wherein: the online data comprise fermentation liquor temperature T, fermentation liquor pH, fermentation liquor volume V, fermentation tank pressure P, ventilation volume flow F, dissolved oxygen DO and stirring paddle rotating speed omega; the offline data includes the biological enzyme concentration recorded offline.

3. The method for predicting the concentration of biological enzymes in a microbial fermentation process according to claim 1, wherein the statistical analysis is performed and a classification standard is designed based on the offline data, wherein: according to the classification function J _c,i Classifying the value of (t) to calculate a classification function J _c,i The formula for the value of (t) is:

wherein J is _c,i (T) represents the classification function value of the ith tank lot at the time T, T _W Represents the window width that is considered to achieve classification, J _i (t) shows the production of the ith tank lot at time tConcentration of the physical enzyme.

4. A method for predicting the concentration of a biological enzyme in a microbial fermentation process according to claim 3 wherein the classifying the historical tank lot comprises: definition J _c,ave (t) is the mean value of the classification function at the time t, sigma (t) is the standard deviation of the classification function at the time t, alpha is the confidence coefficient, and the classification standard is designed as follows:

J _c,i (t)＜J _c,ave (t) - α.σ (t), a low-yield tank lot;

J _c,i (t)＞J _c,ave (t) +α.σ (t) is a high yield tank lot.

5. The method for forecasting biological enzyme concentration in a microbial fermentation process according to claim 1, wherein the on-line data and the off-line data are respectively built into a mixed forecasting training database, wherein: the data of the mixed forecast training database are uniformly distributed, and the data sets are derived from production data in the same period of time.

6. The method for predicting the concentration of biological enzymes in a microbial fermentation process according to claim 1, wherein the tank lot data of each type are trained by using a neural network prediction model and an extreme gradient lifting prediction model respectively, wherein: the artificial neural network prediction model is based on the collected state data of the fermentation production process, a characteristic model between input data and output data is established, and the actual fermentation process is simulated.

7. The method for predicting the concentration of biological enzymes in a microbial fermentation process according to claim 1, wherein the tank lot data of each type are trained by using a neural network prediction model and an extreme gradient lifting prediction model respectively, wherein: the input variables of the extreme gradient lifting prediction model comprise detectable variables affecting the concentration of the biological enzyme, the output variables comprise the concentration of the biological enzyme which is predicted in advance, and new classification trees are continuously added to the current model in the training process of the extreme gradient lifting prediction model so as to improve the model prediction precision.

8. The method for predicting the concentration of biological enzymes in a microbial fermentation process according to claim 1, wherein the performing nonlinear fusion on the trained artificial neural network prediction model and the extreme gradient lifting prediction model based on the extreme gradient lifting algorithm comprises:

9. The method for predicting the concentration of biological enzymes in a microbial fermentation process according to claim 1, wherein the rolling update of the mixed prediction training database each time a tank fermentation is completed comprises:

10. A system for predicting the concentration of a biological enzyme in a microbial fermentation process, comprising: