CN115952685B - Sewage treatment process soft measurement modeling method based on integrated deep learning - Google Patents

Sewage treatment process soft measurement modeling method based on integrated deep learning Download PDF

Info

Publication number
CN115952685B
CN115952685B CN202310053332.5A CN202310053332A CN115952685B CN 115952685 B CN115952685 B CN 115952685B CN 202310053332 A CN202310053332 A CN 202310053332A CN 115952685 B CN115952685 B CN 115952685B
Authority
CN
China
Prior art keywords
follows
model
soft measurement
sewage treatment
treatment process
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310053332.5A
Other languages
Chinese (zh)
Other versions
CN115952685A (en
Inventor
熊金琳
彭甜
李正波
陶孜菡
张楚
赵环宇
伏咏妍
王宇涵
黄小龙
花磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cao Liang
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202310053332.5A priority Critical patent/CN115952685B/en
Publication of CN115952685A publication Critical patent/CN115952685A/en
Application granted granted Critical
Publication of CN115952685B publication Critical patent/CN115952685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a sewage treatment process soft measurement modeling method based on integrated deep learning. Firstly, acquiring sewage data as auxiliary variables; then, selecting the acquired variables by using KPCA features and then taking the selected variables as the input of a model; establishing a sewage soft measurement integrated model, wherein the integrated model is provided with two layers, the first layer comprises three basic learners of BiLSTM, LSSVM and XGBoost, a 5-fold cross validation method is adopted for training, and the second layer adopts ELM as a meta-learner; and finally, carrying out error correction on the initial prediction result by adopting an extreme learning machine. In order to improve the performance of the model, an RSA algorithm is provided for optimizing model parameters; and according to the RSA algorithm, the Latin hypercube, nonlinear factors, golden sine and flip strategies are used for improving the RSA algorithm in the aspects of convergence accuracy, easy sinking into local optimum and the like. Compared with the traditional soft measurement method, the method can integrate the advantages of each model, has stronger generalization capability of the whole model and higher prediction precision.

Description

Sewage treatment process soft measurement modeling method based on integrated deep learning
Technical Field
The invention relates to the field of soft measurement modeling of industrial sewage treatment processes, in particular to a soft measurement modeling method of key water quality parameters of a sewage treatment process based on integrated deep learning.
Background
The wastewater treatment process is very complex, relates to a complex dynamic physical reaction process, biological reaction process and chemical reaction process, has strong nonlinearity, uncertainty, time-varying characteristic and extensive hysteresis, and is difficult to establish an accurate model. In order to maintain good environment of the sewage treatment system, ensure stability, running speed, reliability test and the like of the control system, ensure high sewage discharge quality and reach discharge standards, and need to check and monitor several important technical processes such as water meter, water quality parameters, environmental parameters and the like in real time in the sewage treatment process. In practice, however, due to the lack of on-line measuring instruments and corresponding measuring sensors, or their need to work in extremely harsh environments, the purchase and maintenance costs are high, with the result that it may be difficult to achieve real-time measurements of some quality variables in an industrial process. The soft measurement technology can realize real-time monitoring and control of a dominant variable by establishing a mathematical model between an easily-measured variable (auxiliary variable) and a variable (dominant variable) which is difficult to directly measure in the process. The device has the advantages of convenient maintenance and low time delay, and is rapidly developed.
The deep learning has a multi-layer structure more complicated than the traditional model, has more information, better data extraction and stronger nonlinear characteristic expression capability, and can accurately map out the hidden complex mapping relation under the industrial data. By utilizing the advantage of deep learning and combining soft measurement, the data characteristic information can be fully extracted, and the prediction accuracy of the model is improved. The development of deep learning has been that various deep learning models are continuously developed and a plurality of new deep models are introduced, but a single soft measurement model has a plurality of problems such as local optimum and insufficient precision. In view of a large number of implicit data features contained in the data, various algorithms and network models are integrated, and the overall accuracy and generalization capability of the models are improved by combining respective advantages, so that the method has become a new research direction in soft measurement modeling of complex industrial processes.
Disclosure of Invention
The invention aims to: aiming at the problem that the water quality index biochemical oxygen demand of the sewage treatment process is difficult to realize on-line measurement, the soft measurement modeling method of the sewage treatment process based on integrated deep learning is provided, and compared with the traditional model, the soft measurement modeling method of the sewage treatment process based on integrated deep learning is better in stability and stronger in generalization capability, has theoretical research significance and has great practical application value.
The technical scheme is as follows: the invention discloses a soft measurement modeling method based on integrated deep learning sewage treatment process, which comprises the following steps:
s1, acquiring sewage data and performing data pretreatment;
s2, performing feature selection on the processed data by using KPCA, and selecting proper auxiliary variables so as to construct a soft measurement data sample set;
s3, taking the sewage data set processed in the S2 as the original input of a first-layer base learner in a Stacking integrated framework, wherein the first-layer base learner comprises a two-way long-short-term memory network BiLSTM, a limit gradient lifting XGBoost and a least square support vector machine LSSVM, and the prediction result of the first-layer base learner is obtained by carrying out 5-fold cross validation on each base learner;
s4, the second layer adopts ELM as a meta learner, and the result obtained from the first layer base learner is used as a training set of the second layer meta learner so as to complete training of the second layer meta learner;
s5, optimizing model parameters of the base learner by adopting an improved reptile search algorithm, wherein the improved reptile search algorithm comprises the following steps: the Latin hypercube initialization is adopted, in the iterative process, the non-linear method is adopted to improve the evolutionary meaning ES value, the golden sine and the flip strategy are utilized to improve the individual optimizing mode, and the optimized model is utilized to carry out soft measurement, so as to obtain the prediction result of the biochemical oxygen demand;
s6, error correction is carried out on the initial prediction result by adopting ELM, and a final prediction result is obtained.
Further, the sewage data in the step S1 includes suspended matter concentration SS, total nitrogen TN, ammonia nitrogen NH3-N, total phosphorus TP, chemical oxygen demand COD, and biochemical oxygen demand BOD at a historical time.
Further, in the step S2, the data is extracted by using KPCA, which specifically includes the steps of:
s3.1 let the training set sample data be x= (X) 1 ,x 2 ,...,x m ) X is determined by a mapping function phi (x) i Mapping to a high-dimensional feature space;
s3.2, calculating a covariance matrix C of the feature space:
s3.3, calculating a characteristic equation of the covariance matrix:
λ i ξ i =Cξ i (3)
wherein ,λi Is the eigenvalue of covariance matrix C, ζ i Is corresponding to the characteristic value lambda i Is a feature vector of (1);
s3.4 defines a kernel matrix K:
K=φ(x i )·φ(x i ) T (4)
s3.5, calculating a characteristic equation of the kernel matrix K:
wherein ,is the eigenvector of the kernel matrix K, α i Is corresponding to the characteristic value +.>Is a feature vector of (1);
s3.6 substituting the covariance matrix C and the kernel matrix K into the characteristic equation of the kernel matrix K, and then substituting the characteristic vector xi of the covariance matrix C i Can use non-linearityFunction phi (x) i ) The expression is as follows:
wherein ,is xi i A corresponding ith coefficient;
s3.7 calculating eigenvalues of the kernel matrix KThe eigenvalues are arranged in descending order
S3.8 sequentially calculating the contribution rate eta of the characteristic values i And the cumulative contribution rate P is as follows:
s3.9, selecting the characteristic with the accumulated contribution rate P more than or equal to 85% as a main auxiliary variable input by the sewage soft measurement model.
Further, in the step S3, the step of establishing the two-way long-short term memory network prediction model is as follows:
s4.1, taking the determined auxiliary variable as an input vector x of the network;
s4.2 setting the forward hidden layer state asThe reverse hidden layer state is->w is differentThe output y calculation process of BiLSTM is:
s4.3 y t As a result of the prediction of the model.
Further, in the step S3, the step of establishing the limit gradient lifting prediction model is as follows:
s5.1 let the training data set be T = { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n ) Loss function ofRegularization term Ω (f) k ) The overall objective function can be written as:
where L (φ) is a representation in linear space, i is the ith sample, k is the kth tree,is the ith sample x i Is a predicted value of (2);
s5.2, fitting the residual error of the predicted result of the last tree by using the predicted result of each tree:
s5.3, obtaining a predicted result of the t-th tree in the last step, wherein the predicted result is equal to the predicted result of the t-1 tree in front in value, and adding the expression of the t-th tree, and for the t-th tree, the objective function is as follows:
s5.4 approximating the original target by Taylor expansion with equation (14), define
Then formula (14) may be:
s5.5, obtaining an optimal solution of the objective function, wherein the optimal solution is as follows:
wherein I j ={i|q(x i ) =j } means that a certain sample is mapped to a node set;
s5.6, calculating a Gain value Gain, updating the maximum gain_max, updating the separation point, and finally obtaining the optimal separation point;
s5.7, repeating the process to recursively build the tree until the condition is terminated.
Further, in the step S3, the step of establishing a least squares support vector machine prediction model is as follows:
s6.1, optimizing a target, and defining a loss function as follows:
has the constraint condition:
where ω is weight, ζ i Is an error variable, b is a deviation, c > 0 is a penalty coefficient;
s6.2 introducing a lagrange multiplier, formula (18) can be converted into:
wherein, lagrangian multiplier a i >0(i=1,2,...,N);
S6.3 solving the optimal conditions
S6.4, combining the formulas to obtain an optimal regression function as follows:
wherein ,K(xi ,y j ) Is a kernel function, x i Is the center of the kernel function, x is the input of the training sample, y i Is the output of the training samples.
Further, in the step S4, the calculation formula of the prediction model of the meta learner is as follows:
where L is the number of hidden layer units, N is the number of training samples, β is the weight vector between the i-th hidden layer and the output layer, w is the weight vector between the input and output, g is the activation function, b is the bias vector, and x is the input vector.
Further, the steps of the improved reptile search algorithm in step S5 are as follows:
s5.1, using Latin hypercube sampling initialization to replace random initialization of an RSA algorithm, and setting search upper and lower bounds, population size and iteration times of IRSA;
s5.2, a surrounding phase, wherein crocodile individuals start to surround the prey, and a mathematical model is as follows:
η ij =B j (t)×P ij (24)
wherein ,Bj (t) represents the location of the optimal solution; s is S i,j (t+1) represents the next update position; t is the current iteration number; t (T) max Is the maximum number of iterations; η (eta) ij Representing a hunting operator; r is R ij Is a reduction function for reducing the search space; alpha and beta are sensitive parameters, and the search precision is controlled; r is (r) 1 ,r 2 Are all [1, N]A random number within; r is (r) 3 Is [ -1,1]Random integers in (a); ES (t) is evolutionary; p (P) ij Representing a percentage difference between the optimal solution position and the current solution position; m(s) i ) Representing the average position of the ith solution;
s5.3, improving the evolution meaning parameter of the formula (26), wherein the improved evolution meaning expression is as follows:
s5.4 hunting stage, the crocodile individuals start hunting, and the mathematical model is as follows:
s5.5, introducing a golden sine and a flip bucket strategy to a position updating formula (30), wherein the improved formula is as follows:
wherein ,γ1 ,γ 2 Respectively [0,2 pi ]]And [0, pi ]]A random number within; gamma ray 3 ,γ 4 Is [0,1 ]]A random number within; f=2 is the void fraction, defining the position relative to the prey; x is x 1 ,x 2 Is the golden sine coefficient x 1 and x2 The calculation formula of (2) is as follows:
x 1 =a*(1-γ)+b*σ (32)
x 2 =a*γ+b*(1-σ) (33)
wherein a and b are golden section ratio search initial values,is the golden ratio.
Further, in the step S5, optimizing parameters of the model using the modified reptile search algorithm includes: the learning rate and hidden layer node number of the two-way long-short-term memory network BiLSTM and the limit gradient promote the weight and learning rate of XGBoost, and the least square support vector machine LSSVM has optimal penalty coefficient and kernel function width value.
Further, in the step S6, the error correction step is performed by using ELM as follows:
s6.1, subtracting an initial predicted value obtained by the integrated model from an original observed value to construct an error sequence;
s6.2, predicting an error sequence by using an ELM network;
s6.3, the initial prediction sequence and the error prediction sequence are linearly added to obtain a final prediction result.
The beneficial effects are that:
the depth and integrated learning method based on the invention gives consideration to the training principle difference of different algorithms, and fully plays the advantages of each model in the prediction process. The stronger the learning ability of the base learner, the smaller the degree of correlation between each other, and the better the final prediction effect. Aiming at the problem that the model hyper-parameters are difficult to determine, a reptile search algorithm is introduced to optimize the model, and the optimal model parameters are selected. And the algorithm improvement is proposed to improve the optimizing capability of the algorithm, and the improvement is as follows: firstly, aiming at the problem that the random initialization of an algorithm can not uniformly distribute the population in the whole optimizing space, latin hypercube initialization is introduced to ensure that the initial population uniformly covers the whole distributing space; secondly, in the iterative process, the random decreasing strategy of the ES value with the evolutionary meaning between-2 and 2 can not completely explain the actual convergence optimization process, so that the nonlinear strategy is adopted for improvement, the algorithm can more effectively balance the global and local searching capability, and the convergence precision of the algorithm is improved. And finally, an individual optimizing mode is improved by utilizing a golden sine and turning strategy, so that a common individual exchanges information with an optimal individual in each iteration, the position difference information between the common individual and the optimal individual is thoroughly absorbed, and the algorithm searching performance and the searching accuracy are improved. Compared with the traditional single model prediction, the sewage treatment process soft measurement modeling method based on depth and integrated learning has higher precision and generalization capability.
Drawings
FIG. 1 is a schematic diagram of a multi-model training framework of a sewage treatment process soft measurement modeling method based on integrated deep learning;
FIG. 2 is a flow chart of algorithm optimization model parameters in the integrated deep learning-based sewage treatment process soft measurement modeling method provided by the invention;
fig. 3 is a flowchart of the sewage treatment process soft measurement modeling method based on integrated deep learning.
Detailed Description
Embodiments of the present invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, the invention provides a sewage treatment process soft measurement modeling method based on integrated deep learning, which comprises the following steps:
s1, acquiring sewage data from an international standard BSM 1 simulation platform and performing data preprocessing.
S1.1, obtaining sewage data from an international standard simulation platform, wherein the sewage data comprises ammonia nitrogen NH3-N, suspended matter concentration SS, chemical oxygen demand COD, total nitrogen TN, total phosphorus TP and biochemical oxygen demand BOD at historical time.
S1.2, preprocessing the acquired sewage data to normalize the data, wherein the formula is as follows:
in the formula ,S* Represents the normalized data, S represents the original data, S max and Smin Representing the maximum and minimum values in the original data, respectively.
S2, performing feature selection on the processed data by using KPCA, and selecting the most suitable auxiliary variable, thereby constructing a soft measurement data sample set.
S2.1 let the training set sample data be x= (X) 1 ,x 2 ,...,x m ) X is determined by a mapping function phi (x) i Mapped to a high-dimensional feature space.
S2.2, calculating a covariance matrix C of the feature space:
s2.3, calculating a characteristic equation of the covariance matrix:
λ i ξ i =Cξ i (3)
in the formula ,λi Is the eigenvalue of covariance matrix C, ζ i Is corresponding to the characteristic value lambda i Is described.
S2.4 defines a kernel matrix K:
K=φ(x i )·φ(x i ) T (4)
s2.5, calculating a characteristic equation of the kernel matrix K:
in the formula ,is the eigenvector of the kernel matrix K, α i Is corresponding to the characteristic value +.>Is described.
S2.6 substituting the covariance matrix C and the kernel matrix K into the characteristic equation of the kernel matrix K, and then substituting the characteristic vector xi of the covariance matrix C i Can use nonlinear function phi (x i ) The expression is as follows:
in the formula ,is xi i The corresponding i-th coefficient.
S2.7 calculating eigenvalues of the kernel matrix KThe eigenvalues are arranged in descending order
S2.8 sequentially calculating the contribution rate eta of the characteristic values i And the cumulative contribution rate P is as follows:
s2.9, selecting the characteristic with the accumulated contribution rate P more than or equal to 85% as a main auxiliary variable input by the sewage soft measurement model.
S3, taking the sewage data set processed in the S2 as the original input of a first-layer base learner in a Stacking integrated framework, wherein the first-layer base learner comprises a two-way long-short-term memory network BiLSTM, a limit gradient lifting XGBoost and a least square support vector machine LSSVM, and the prediction result of the first-layer base learner is obtained by carrying out 5-fold cross validation on each base learner.
S3.1, establishing a two-way long-short term memory network prediction model.
S3.1.1 takes the determined auxiliary variable as the input vector x of the network.
S3.1.2 set the forward hidden layer state asThe reverse hidden layer state is->w is a different weight matrix, and the output y of BiLSTM is calculated by the following steps:
s3.1.3 y is t As a result of the prediction of the model.
S3.2, establishing a limit gradient lifting prediction model.
S3.2.1 training dataset is t= { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n ) Loss function ofRegularization term Ω (f) k ) The overall objective function can be written as:
where L (φ) is a linear spatial representation, i is the ith sample, k is the kth tree,is the ith sample x i Is a predicted value of (a).
S3.2.2 the prediction result of each tree is utilized to fit the residual error of the prediction result of the last tree, so that the overall tree model effect is better and better;
s3.2.3 the predicted result of the t-th tree is obtained in a step and is equal to the predicted result of the t-1 tree in front in value, and the expression of the t-th tree is added. For the t-th tree, the objective function is:
s3.2.4 equation (14) is developed by Taylor to approximate the original target, defining
Then formula (14) may be:
s3.2.5 the optimal solution for the objective function is found as:
in the formula I j ={i|q(x i ) =j } means that a certain sample is mapped to a set of nodes.
S3.2.6 calculates Gain value Gain, updates maximum gain_max, and updates the separation point to obtain the optimal separation point.
S3.2.7 repeating the above process recursively builds a tree until the condition is terminated.
S3.3, establishing a least square support vector machine prediction model.
S3.3.1 optimization objective, define the loss function as:
has the constraint condition:
wherein ω is a weight, ζ i Is the error variable, b is the bias, c > 0 is the penalty coefficient.
S3.3.2 introducing a lagrange multiplier, equation (18) can be converted to:
in the Lagrangian multiplier a i >0(i=1,2,...,N)。
S3.3.3 solving for optimal conditions
S3.3.4 the optimum regression function obtained by integrating the above formula is as follows:
in the formula ,K(xi ,y j ) Is a kernel function, x i Is the center of the kernel function, x is the input of the training sample, y i Is the output of the training samples.
S4, using the result obtained from the first layer base learner as a training set of the second layer element learner to complete training of the second layer element learner, wherein a prediction model of the element learner is ELM, and a calculation formula is as follows:
where L is the number of hidden layer units, N is the number of training samples, β is the weight vector between the i-th hidden layer and the output layer, w is the weight vector between the input and output, g is the activation function, b is the bias vector, and x is the input vector.
S5, optimizing model parameters of the base learner by adopting an improved reptile search algorithm, and performing soft measurement by utilizing an optimized model to obtain a prediction result of the biochemical oxygen demand.
S5.1, using Latin hypercube sampling initialization to replace random initialization of RSA algorithm, setting search upper and lower bounds, population size and iteration times of IRSA.
The Latin hypercube sampling initialization method comprises the following steps:
s5.1.1 determines the population size a and the dimension D.
S5.1.2 the interval of variable A is [ low, up ], up and low being the upper and lower bounds of variable A, respectively.
S5.1.3 the interval of variable a is divided into N equal subintervals.
S5.1.4 randomly selects a point from each subinterval of each dimension.
S5.1.5 combine each of the selected points to form an initial population.
S5.2, a surrounding phase, wherein crocodile individuals start to surround the prey, and a mathematical model is as follows:
η ij =B j (t)×P ij (24)
in the formula ,Bj (t) represents the location of the optimal solution; s is S i,j (t+1) represents the next update position; t is the current iteration number; t (T) max Is the maximum number of iterations; η (eta) ij Representing a hunting operator; r is R ij Is a reduction function for reducing search spaceA compartment; alpha and beta are sensitive parameters, and the search precision is controlled; r is (r) 1 ,r 2 Are all [1, N]A random number within; r is (r) 3 Is [ -1,1]Random integers in (a); ES (t) is evolutionary; p (P) ij Representing a percentage difference between the optimal solution position and the current solution position; m(s) i ) Representing the average position of the ith solution.
S5.3, improving the evolution meaning parameter of the formula (26), wherein the improved evolution meaning expression is as follows:
s5.4 hunting stage, the crocodile individuals start hunting, and the mathematical model is as follows:
s5.5, introducing a golden sine and a flip bucket strategy to a position updating formula (30), wherein the improved formula is as follows:
in the formula ,γ1 ,γ 2 Respectively [0,2 pi ]]And [0, pi ]]A random number within; gamma ray 3 ,γ 4 Is [0,1 ]]A random number within; f=2 is the void fraction, defining the position relative to the prey; x is x 1 ,x 2 Is the golden sine coefficient x 1 and x2 The calculation formula of (2) is as follows:
x 1 =a*(1-γ)+b*σ (32)
x 2 =a*γ+b*(1-σ) (33)
wherein a and b are golden section ratio search initial values,is the golden ratio.
S5.6, optimizing the learning rate and hidden layer node number, the weight and learning rate of XGBoost, the optimal penalty coefficient and kernel function width value of the LSSVM by using the improved reptile search algorithm.
S6, error correction is carried out on the initial prediction result by adopting ELM, and a final prediction result is obtained.
S6.1, subtracting the initial predicted value obtained by the integrated model from the original observed value to construct an error sequence.
S6.2 predicts the error sequence using the ELM network.
S6.3, the initial prediction sequence and the error prediction sequence are linearly added to obtain a final prediction result.
S7, constructing a sewage treatment soft measurement platform based on QT, python and MATLAB, wherein the sewage treatment soft measurement platform comprises a user login interface, a sewage data monitoring module and an online prediction module, and the aim of soft measurement system visualization is achieved.
The invention also realizes the BOD soft measurement intelligent system, which comprises a data acquisition module, a data processing module, a model training module, a parameter optimization module, an error correction module and an on-line monitoring module.
And a data acquisition module: is used for acquiring sewage data, including ammonia nitrogen NH3-N, suspended matter concentration SS, chemical oxygen demand COD, total nitrogen TN, total phosphorus TP and biochemical oxygen demand BOD at historical time.
And a data processing module: and extracting the characteristics of the collected sewage data by using a KPCA method, selecting the characteristics with high correlation, and screening auxiliary variables most suitable for model input.
Model training module: an integration model based on a Stacking method is established, and the method integrates the Stacking with BiLSTM, XGBoost, LSSVM, so that the effect of the model is improved while the overfitting is relieved.
Parameter optimization module: and (3) optimizing parameters of the model by using an IRSA algorithm, wherein the parameters comprise the learning rate and the hidden layer node number of BiLSTM, the weight and the learning rate of XGBoost, the optimal penalty coefficient and the kernel function width value of the LSSVM.
Error correction module: error sequence-based correction of the primary prediction results is performed using ELM.
And an online monitoring module: the system comprises a user login interface, a sewage data monitoring and online prediction module, and realizes the aim of the visualization of the soft measurement system.
The present invention is not limited to the above embodiments, and any simple modification, equivalent variation and modification made to the above embodiments according to the technical substance of the present invention falls within the scope of the technical solution of the present invention.

Claims (9)

1. The sewage treatment process soft measurement modeling method based on integrated deep learning is characterized by comprising the following steps of:
s1, acquiring sewage data and performing data pretreatment;
s2, performing feature selection on the processed data by using KPCA, and selecting proper auxiliary variables so as to construct a soft measurement data sample set; the data is extracted by using KPCA, and the specific steps are as follows:
s3.1 let the training set sample data be x= (X) 1 ,x 2 ,...,x m ) X is determined by a mapping function phi (x) i Mapping to a high-dimensional feature space;
s3.2, calculating a covariance matrix C of the feature space:
s3.3, calculating a characteristic equation of the covariance matrix:
λ i ξ i =Cξ i (3)
wherein ,λi Is the eigenvalue of covariance matrix C, ζ i Is corresponding to the characteristic value lambda i Is a feature vector of (1);
s3.4 defines a kernel matrix K:
K=φ(x i )·φ(x i ) T (4)
s3.5, calculating a characteristic equation of the kernel matrix K:
wherein ,is the eigenvector of the kernel matrix K, α i Is corresponding to the characteristic value +.>Is a feature vector of (1);
s3.6 substituting the covariance matrix C and the kernel matrix K into the characteristic equation of the kernel matrix K, and then substituting the characteristic vector xi of the covariance matrix C i Can use nonlinear function phi (x i ) The expression is as follows:
wherein ,is xi i A corresponding ith coefficient;
s3.7 calculating eigenvalues of the kernel matrix KThe characteristic values are arranged in descending order +.>
S3.8 sequentially calculating the contribution rate eta of the characteristic values i And the cumulative contribution rate P is as follows:
s3.9, selecting the characteristic with the accumulated contribution rate P more than or equal to 85% as a main auxiliary variable input by the sewage soft measurement model;
s3, taking the sewage data set processed in the S2 as the original input of a first-layer base learner in a Stacking integrated framework, wherein the first-layer base learner comprises a two-way long-short-term memory network BiLSTM, a limit gradient lifting XGBoost and a least square support vector machine LSSVM, and the prediction result of the first-layer base learner is obtained by carrying out 5-fold cross validation on each base learner;
s4, the second layer adopts ELM as a meta learner, and the result obtained from the first layer base learner is used as a training set of the second layer meta learner so as to complete training of the second layer meta learner;
s5, optimizing model parameters of the base learner by adopting an improved reptile search algorithm, wherein the improved reptile search algorithm comprises the following steps: the Latin hypercube initialization is adopted, in the iterative process, the non-linear method is adopted to improve the evolutionary meaning ES value, the golden sine and the flip strategy are utilized to improve the individual optimizing mode, and the optimized model is utilized to carry out soft measurement, so as to obtain the prediction result of the biochemical oxygen demand;
s6, error correction is carried out on the initial prediction result by adopting ELM, and a final prediction result is obtained.
2. The integrated deep learning-based sewage treatment process soft measurement modeling method according to claim 1, wherein the sewage data in the step S1 includes suspended matter concentration SS, total nitrogen TN, ammonia nitrogen NH3-N, total phosphorus TP, chemical oxygen demand COD, and biochemical oxygen demand BOD at historical time.
3. The method for modeling soft measurement of sewage treatment process based on integrated deep learning according to claim 1, wherein in the step S3, the step of establishing a two-way long-short term memory network prediction model is as follows:
s4.1, taking the determined auxiliary variable as an input vector x of the network;
s4.2 setting the forward hidden layer state asThe reverse hidden layer state is->w is a different weight matrix, and the output y of BiLSTM is calculated by the following steps:
s4.3 y t As a result of the prediction of the model.
4. The method for modeling soft measurement of sewage treatment process based on integrated deep learning according to claim 1, wherein in the step S3, the step of establishing a limit gradient lifting prediction model is as follows:
s5.1 let the training data set be T = { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n ) Loss function ofRegularization term Ω (f) k ) The overall objective function can be written as:
where L (φ) is a representation in linear space, i is the ith sample, k is the kth tree,is the ith sample x i Is a predicted value of (2);
s5.2, fitting the residual error of the predicted result of the last tree by using the predicted result of each tree:
……
s5.3, obtaining a predicted result of the t-th tree in the last step, wherein the predicted result is equal to the predicted result of the t-1 tree in front in value, and adding the expression of the t-th tree, and for the t-th tree, the objective function is as follows:
s5.4 approximating the original target by Taylor expansion with equation (14), define
Then formula (14) may be:
s5.5, obtaining an optimal solution of the objective function, wherein the optimal solution is as follows:
wherein I j ={i|q(x i ) =j } means that a certain sample is mapped to a node set;
s5.6, calculating a Gain value Gain, updating the maximum gain_max, updating the separation point, and finally obtaining the optimal separation point;
s5.7, repeating the process to recursively build the tree until the condition is terminated.
5. The method for modeling soft measurement of sewage treatment process based on integrated deep learning according to claim 1, wherein in the step S3, the step of establishing a least squares support vector machine prediction model is as follows:
s6.1, optimizing a target, and defining a loss function as follows:
has the constraint condition:
where ω is weight, ζ i Is an error variable, b is a deviation, c > 0 is a penalty coefficient;
s6.2 introducing a lagrange multiplier, formula (18) can be converted into:
wherein, lagrangian multiplier a i >0(i=1,2,...,N);
S6.3 solving the optimal conditions
S6.4, combining the formulas to obtain an optimal regression function as follows:
wherein ,K(xi ,y j ) Is a kernel function, x i Is the center of the kernel function, x is the input of the training sample, y i Is the output of the training samples.
6. The integrated deep learning-based sewage treatment process soft measurement modeling method according to claim 1, wherein the meta-learner prediction model calculation formula in step S4 is as follows:
where L is the number of hidden layer units, N is the number of training samples, β is the weight vector between the i-th hidden layer and the output layer, w is the weight vector between the input and output, g is the activation function, b is the bias vector, and x is the input vector.
7. The integrated deep learning based sewage treatment process soft measurement modeling method according to claim 1, wherein the step of the improved reptile search algorithm in step S5 is as follows:
s5.1, using Latin hypercube sampling initialization to replace random initialization of an RSA algorithm, and setting search upper and lower bounds, population size and iteration times of IRSA;
s5.2, a surrounding phase, wherein crocodile individuals start to surround the prey, and a mathematical model is as follows:
η ij =B j (t)×P ij (24)
wherein ,Bj (t) represents the location of the optimal solution; s is S i,j (t+1) represents the next update position; t is the current iteration number; t (T) max Is the maximum number of iterations; η (eta) ij Representing a hunting operator; r is R ij Is a reduction function for reducing the search space; alpha and beta are sensitive parameters, and the search precision is controlled; r is (r) 1 ,r 2 Are all [1, N]A random number within; r is (r) 3 Is [ -1,1]Random integers in (a); ES (t) is evolutionary; p (P) ij Representing a percentage difference between the optimal solution position and the current solution position; m(s) i ) Representing the average position of the ith solution;
s5.3, improving the evolution meaning parameter of the formula (26), wherein the improved evolution meaning expression is as follows:
s5.4 hunting stage, the crocodile individuals start hunting, and the mathematical model is as follows:
s5.5, introducing a golden sine and a flip bucket strategy to a position updating formula (30), wherein the improved formula is as follows:
wherein ,γ1 ,γ 2 Respectively [0,2 pi ]]And [0, pi ]]A random number within; gamma ray 3 ,γ 4 Is [0,1 ]]A random number within; f=2 is the void fraction, defining the position relative to the prey; x is x 1 ,x 2 Is the golden sine coefficient x 1 and x2 The calculation formula of (2) is as follows:
x 1 =a*(1-γ)+b*σ (32)
x 2 =a*γ+b*(1-σ) (33)
wherein a and b are golden section ratio search initial values,is the golden ratio.
8. The integrated deep learning based sewage treatment process soft measurement modeling method according to claim 7, wherein in the step S5, optimizing the parameters of the model using the modified reptile search algorithm comprises: the learning rate and hidden layer node number of the two-way long-short-term memory network BiLSTM and the limit gradient promote the weight and learning rate of XGBoost, and the least square support vector machine LSSVM has optimal penalty coefficient and kernel function width value.
9. The method for modeling soft measurement of a sewage treatment process based on integrated deep learning according to any one of claims 1 to 8, wherein in the step S6, the error correction step using ELM is as follows:
s6.1, subtracting an initial predicted value obtained by the integrated model from an original observed value to construct an error sequence;
s6.2, predicting an error sequence by using an ELM network;
s6.3, the initial prediction sequence and the error prediction sequence are linearly added to obtain a final prediction result.
CN202310053332.5A 2023-02-02 2023-02-02 Sewage treatment process soft measurement modeling method based on integrated deep learning Active CN115952685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310053332.5A CN115952685B (en) 2023-02-02 2023-02-02 Sewage treatment process soft measurement modeling method based on integrated deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310053332.5A CN115952685B (en) 2023-02-02 2023-02-02 Sewage treatment process soft measurement modeling method based on integrated deep learning

Publications (2)

Publication Number Publication Date
CN115952685A CN115952685A (en) 2023-04-11
CN115952685B true CN115952685B (en) 2023-09-29

Family

ID=87296827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310053332.5A Active CN115952685B (en) 2023-02-02 2023-02-02 Sewage treatment process soft measurement modeling method based on integrated deep learning

Country Status (1)

Country Link
CN (1) CN115952685B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562331B (en) * 2023-05-19 2023-11-21 石家庄铁道大学 Method for optimizing SVM by improving reptile search algorithm and application thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132333A (en) * 2020-09-16 2020-12-25 安徽泽众安全科技有限公司 Short-term water quality and water quantity prediction method and system based on deep learning
CN113448245A (en) * 2021-04-14 2021-09-28 华南师范大学 Deep learning-based dissolved oxygen control method and system in sewage treatment process
CN113449464A (en) * 2021-06-11 2021-09-28 淮阴工学院 Wind power prediction method based on improved depth extreme learning machine
CN113837364A (en) * 2021-09-17 2021-12-24 华南师范大学 Sewage treatment soft measurement method and system based on residual error network and attention mechanism
CN114861528A (en) * 2022-04-18 2022-08-05 湖北工业大学 Wireless power transmission system parameter optimization method based on improved wolf algorithm
CN114944203A (en) * 2022-05-11 2022-08-26 华南师范大学 Wastewater treatment monitoring method and system based on automatic optimization algorithm and deep learning
CN115470702A (en) * 2022-09-14 2022-12-13 中山大学 Sewage treatment water quality prediction method and system based on machine learning
CN115561282A (en) * 2022-09-27 2023-01-03 上海化工研究院有限公司 Detection method and system for heavy metal in underground water

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132333A (en) * 2020-09-16 2020-12-25 安徽泽众安全科技有限公司 Short-term water quality and water quantity prediction method and system based on deep learning
CN113448245A (en) * 2021-04-14 2021-09-28 华南师范大学 Deep learning-based dissolved oxygen control method and system in sewage treatment process
CN113449464A (en) * 2021-06-11 2021-09-28 淮阴工学院 Wind power prediction method based on improved depth extreme learning machine
CN113837364A (en) * 2021-09-17 2021-12-24 华南师范大学 Sewage treatment soft measurement method and system based on residual error network and attention mechanism
CN114861528A (en) * 2022-04-18 2022-08-05 湖北工业大学 Wireless power transmission system parameter optimization method based on improved wolf algorithm
CN114944203A (en) * 2022-05-11 2022-08-26 华南师范大学 Wastewater treatment monitoring method and system based on automatic optimization algorithm and deep learning
CN115470702A (en) * 2022-09-14 2022-12-13 中山大学 Sewage treatment water quality prediction method and system based on machine learning
CN115561282A (en) * 2022-09-27 2023-01-03 上海化工研究院有限公司 Detection method and system for heavy metal in underground water

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
盛晓晨 ; 史旭东 ; 熊伟丽 ; .改进粒子群优化的极限学习机软测量建模方法.计算机应用研究.2020,(第06期),89-93. *

Also Published As

Publication number Publication date
CN115952685A (en) 2023-04-11

Similar Documents

Publication Publication Date Title
US11346831B2 (en) Intelligent detection method for biochemical oxygen demand based on a self-organizing recurrent RBF neural network
CN110807554B (en) Generation method and system based on wind power/photovoltaic classical scene set
CN112989711B (en) Aureomycin fermentation process soft measurement modeling method based on semi-supervised ensemble learning
CN110689179A (en) Water bloom prediction method based on space-time sequence mixed model
CN108710974B (en) Water ammonia nitrogen prediction method and device based on deep belief network
CN112464567B (en) Intelligent data assimilation method based on variational and assimilative framework
CN115952685B (en) Sewage treatment process soft measurement modeling method based on integrated deep learning
CN110824915A (en) GA-DBN network-based intelligent monitoring method and system for wastewater treatment
CN114429077B (en) Time sequence multi-scale analysis method based on quantum migration
CN112668606B (en) Step type landslide displacement prediction method based on gradient elevator and quadratic programming
CN110045606A (en) A kind of increment space-time learning method for distributed parameter system line modeling
CN114707395A (en) Groundwater organic pollution source inversion method based on heuristic homotopy algorithm
CN117078114A (en) Water quality evaluation method and system for water-bearing lakes under influence of diversion engineering
CN114819407A (en) Dynamic prediction method and device for lake blue algae bloom
CN110909492B (en) Sewage treatment process soft measurement method based on extreme gradient lifting algorithm
CN114417740A (en) Deep sea breeding situation sensing method
CN117235510A (en) Joint roughness prediction method and training method of joint roughness prediction model
CN113076587A (en) Short-term prediction method for micro-strain of large-span steel structure building
CN115018137B (en) Water environment model parameter calibration method based on reinforcement learning
CN116502539A (en) VOCs gas concentration prediction method and system
CN113222324B (en) Sewage quality monitoring method based on PLS-PSO-RBF neural network model
CN112651168B (en) Construction land area prediction method based on improved neural network algorithm
CN113377075A (en) Method and device for optimizing rare earth extraction process in real time and computer readable storage medium
CN116028757B (en) Optimal soft measurement model generation method and system based on multi-source information fusion
CN118114592B (en) Substitution modeling method for predicting salt water invasion dynamics under uncertainty condition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240219

Address after: 230000 Room 203, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Hefei Jiuzhou Longteng scientific and technological achievement transformation Co.,Ltd.

Country or region after: China

Address before: 223005 Jiangsu Huaian economic and Technological Development Zone, 1 East Road.

Patentee before: HUAIYIN INSTITUTE OF TECHNOLOGY

Country or region before: China

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240403

Address after: No. 6, West Second Lane, Xingkuang Road, Bin County, Xianyang City, Shaanxi Province, 713500

Patentee after: Cao Liang

Country or region after: China

Address before: 230000 Room 203, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee before: Hefei Jiuzhou Longteng scientific and technological achievement transformation Co.,Ltd.

Country or region before: China