CN112884056A

CN112884056A - Optimized LSTM neural network-based sewage quality prediction method

Info

Publication number: CN112884056A
Application number: CN202110239984.9A
Authority: CN
Inventors: 刘心; 时启明; 李文竹
Original assignee: Hebei University of Engineering
Current assignee: Hebei University of Engineering
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-06-01

Abstract

The invention discloses a sewage quality prediction method of a long-term and short-term memory neural network based on attention mechanism optimization, which reduces the dimensionality of input data by using a principal component analysis method, simplifies the complexity of original data, introduces a PSO algorithm and an attention mechanism to optimize an LSTM, enables the LSTM neural network to be trained with a better network structure, improves the recognition capability of the LSTM on the importance degree of a data local characteristic sequence, achieves the purposes of improving the prediction precision of the LSTM neural network and avoiding falling into local optimization, and can more effectively predict the sewage outlet water quality.

Description

Optimized LSTM neural network-based sewage quality prediction method

Technical Field

The invention relates to a sewage quality prediction method based on an attention mechanism optimization Long Short-Term Memory (LSTM) neural network, and belongs to the technical field of water quality prediction.

Background

At present, the situation of sewage prevention and control in China is still severe, and the problem of water pollution treatment at the present stage is changed from simply improving the water environment quality to organically combining water quality improvement with water resource protection and water ecology protection. In order to prevent further deterioration and pollution of water resources in China, the most effective method is to enhance the capabilities of sewage treatment and water quality detection and monitoring.

The production condition of the sewage treatment process is severe, random interference is serious, and the process has the characteristics of multiple input, multiple output, nonlinearity, strong coupling, time lag and the like, so that the process is extremely complex and difficult to describe by a mathematical model, and the accurate monitoring of the effluent quality of the sewage is a difficult problem. The traditional methods for indicating the water quality of the sewage, such as a dichromate method, a permanganate index and the like, have the problems of complex operation, long consumed time, easy secondary pollution and the like. Therefore, the neural network with strong fitting capability and adaptability is more suitable for being used for sewage quality prediction. Meanwhile, the sewage quality data has the characteristics of strong disturbance, periodicity, small sample amount, high dimensionality and the like, and has obvious time sequence characteristics. The existing sewage quality prediction method mostly ignores the time series characteristics of sewage quality data, and does not pay attention to the important degree of influence of the change of the inlet water quality in different time periods on the outlet water quality prediction result.

Through analysis, the prior art is found to have the following defects:

(1) the selection of auxiliary variables (input data) in the prior art is usually determined only by means of correlation analysis, and although all auxiliary variables can be determined, a part of auxiliary variables contribute less to a prediction result, so that data redundancy is caused, and the training speed of a model is influenced.

(2) In the prior art, when a Long Short-Term Memory neural network (LSTM) is adopted to predict the effluent quality, the setting of the super parameters (such as the number of neurons in a hidden layer and the learning rate) is usually determined by human experience, and the setting of different super parameters has a great influence on the prediction result of the neural network. Therefore, how to set the hyper-parameters more reasonably has important significance for improving the prediction result of the neural network.

(3) The existing sewage quality prediction model mostly ignores time series characteristics of sewage data, and does not or less pay attention to changes of local water quality characteristics to the importance degree of effluent quality prediction in different time periods.

Based on the above, the invention provides a Principal Component Analysis (PCA) method for reducing the dimension of input data, simplifying the complexity of original data, determining the optimal initial value of the super-parameters of the LSTM neural network by using a Particle Swarm Optimization (PSO), improving the identification capability of the LSTM neural network on the important characteristics of a local water quality parameter sequence by using an Attention mechanism (att), and finally establishing a PSO-LSTM neural network (PSO-LSTM-att) based on the Attention mechanism for predicting the effluent quality of the sewage.

Disclosure of Invention

The invention provides a sewage quality prediction method based on an LSTM neural network based on the difference of importance degrees of time series characteristics and local water quality data characteristics of sewage quality data in different time periods, and introduces a PSO algorithm and an attention mechanism to optimize the LSTM, so that the LSTM neural network can be trained with a better network structure, the identification capability of the LSTM on the importance degrees of the local characteristic sequences of the data is improved, the purposes of improving the prediction precision of the LSTM neural network and avoiding the local optimization are achieved, and the sewage quality can be more effectively predicted more accurately.

The invention adopts the following technical scheme:

a sewage quality prediction method based on a long-short term memory neural network optimized by an attention mechanism comprises the following steps: determining a leading variable; selecting an auxiliary variable; establishing a water quality prediction model based on a PSO-LSTM-att neural network; predicting the quality of the sewage treatment water. The auxiliary variables are selected using a principal component analysis method. Before selecting auxiliary variables by adopting a principal component analysis method, preprocessing collected sewage historical data, including removing abnormal values, supplementing missing values and normalizing data.

The structure of the PSO-LSTM-att neural network comprises three parts: a PSO algorithm layer, an LSTM neural network layer and a full connection layer; and optimizing the hyper-parameters of the LSTM neural network by adopting a PSO algorithm, adding a full connection layer behind the LSTM neural network hidden layer, and introducing an attention mechanism.

And further, inputting the training set sample into the LSTM-att neural network through the PSO layer for training, outputting the optimal hyper-parameter of the LSTM-att neural network after the training is finished, and inputting the optimal hyper-parameter into the PSO-LSTM-att neural network to serve as a water quality prediction model.

The method for optimizing the hyper-parameters of the LSTM neural network by adopting the PSO algorithm comprises the following steps:

(1) inputting a training set;

(2) setting a hyper-parameter of a PSO algorithm;

(3) randomly initializing a population, and selecting a training set Mean Square Error (MSE) as a fitness function;

(4) inputting the population parameters into an LSTM-att neural network prediction model for training, calculating the fitness of individuals and the population, and continuously updating a PSO operator;

(5) judging whether the training is finished according to whether a termination condition is reached;

(6) if the training is finished, outputting the LSTM-att optimal hyper-parameter; otherwise, returning to the step (4) to continue training.

The termination condition is that the maximum iteration number is reached or the mean square error is smaller than the set training error. The PSO operator comprises an individual optimal value, a population optimal value, the speed and displacement of particles, inertia weight and a learning factor.

The attention mechanism is to perform weighted summation calculation on the output of the hidden layer, allocate additional weight to each feature, and finally obtain a prediction result of the next time period. The formula for the attention mechanism is as follows:

α_i＝a(h_i)＝σ(Wh_i+b)

wherein the vector c is the extracted key features, and m is the sum of time steps input into the LSTM neural network; beta is a_iIs a vector h_iWeight of h_iAnd W is a weight matrix of the full connection layer, b is a bias matrix of the full connection layer, and sigma is a sigmoid function.

Drawings

FIG. 1 is a flow chart of a wastewater quality prediction method of the present invention;

FIG. 2 is a flow chart of a PSO algorithm optimizing neural network hyper-parameters; and

FIG. 3 is a diagram of a PSO-LSTM-att neural network architecture.

Detailed Description

A sewage quality prediction method based on a long-short term memory neural network optimized by an attention mechanism comprises the following specific steps:

(1) selection of leading variable and auxiliary variable of key water quality parameter for sewage treatment

The data used by the method is derived from the sewage index data of a certain sewage treatment plant in sunshine city of Shandong province from 1 month and 1 day in 2019 to 31 days in 7 months in 2020, and the data is collected according to the day. Selecting water Chemical Oxygen Demand (COD) as a dominant variable, and preprocessing the collected historical sewage data, including removing abnormal values, supplementing missing values and normalizing the data.

And screening the sewage quality data by adopting a Principal Component Analysis (PCA) method, determining an auxiliary variable required to be selected by taking the effluent COD (chemical oxygen demand) as a dominant variable, and taking the auxiliary variable obtained by screening as an input variable of a sewage quality prediction model.

(2) Establishment of water quality prediction model based on PSO-LSTM-att neural network and prediction of sewage treatment water quality

The method comprises the steps of establishing a sewage quality prediction model by adopting an LSTM neural network suitable for processing time series data, determining initial values such as the number of neurons in a hidden layer of the model and the learning rate by adopting a PSO algorithm through data set preprocessing and data set construction in the process of establishing the LSTM neural network, and learning the importance degree of the local effluent quality characteristics by using an attention mechanism to finally obtain a sewage quality prediction result.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

1. Selection of dominant variable and auxiliary variable of sewage treatment effluent quality parameter prediction model

1.1 selection of leading variables of effluent quality parameter prediction model for sewage treatment

When evaluating the quality of treated sewage, the effluent COD, BOD, and,

And the like. The indexes are not easy to directly measure, belong to important water quality parameters and are generally used as leading variables of a soft measurement model, namely output variables of the model. COD is a parameter reflecting the content of degradable organic matters in water, is one of key indexes for water quality evaluation, and has important significance for the accurate monitoring of the COD of the effluent on the optimal control of a subsequent sewage treatment system. Therefore, according to the importance and the easy measurement degree of the effluent quality parameters, the effluent COD capable of evaluating whether the effluent quality reaches the standard is selected as a predicted leading variable to verify the feasibility and the effectiveness of the invention.

1.2 Primary selection of auxiliary variables of effluent quality parameter prediction model for wastewater treatment

The selection of the auxiliary variables generally follows the following points:

(1) when the model is established, the method has better generalization capability.

(2) The auxiliary variable must be relatively easy to measure. Such as water temperature, SS, TP, etc.

(3) The auxiliary variables have a certain correlation with the main variable.

In the sewage treatment process, the dimensionality for extracting the auxiliary variable is a key problem, and if the dimensionality of the auxiliary variable is too high, redundant data can be generated, so that the data calculation amount is too large, and the model efficiency is further influenced; if the dimension of the auxiliary variable is small, part of key data can be lost, the accuracy of the model is affected, and the prediction result is inaccurate. Therefore, screening the auxiliary variables is a key step for establishing a sewage quality prediction model. Since COD is an index for measuring the amount of reducing substances in water and thus the organic matter content in water, it is estimated that the initial auxiliary variables are: COD and intake water

Water inlet TP, water inlet TN, water inlet SS, water temperature and PH value.

1.3 data preprocessing

(1) Clearing outliers

To eliminate noisy data, the outlier data values are typically cleaned. According to the 3 sigma principle (Laiya principle), assuming that sample data only contains random errors, the standard deviation is calculated according to the following formula, then a variable interval is obtained through probability, and data exceeding the interval is called an abnormal value and is eliminated.

Let n sample data x₁,x₂,x₃,…,x_nAverage value of

Deviation is as

The standard deviation formula is as follows:

e.g. sample data x_iDeviation v of_i(i-1, 2, …, n) satisfying | v_i|>3 σ, then x_iDeleted as an abnormal value.

(2) Complementing missing values

Because data may encounter various problems during the collection process, partial data values of the data set may be missing, and the missing data may possibly contain important information, which may make the data set unable to provide sufficient data characteristics for the prediction model, and the result of the prediction model may be unstable or unreliable. Therefore, the missing value needs to be filled.

According to the period and time sequence characteristics of the sewage quality data, the method preferably adopts a Lagrange interpolation method to complement the missing data.

(3) Data normalization

As each parameter of the sewage data has different dimensions and dimension units, different parameters can influence each other. In order to eliminate the influence of dimension between parameters, the sewage data needs to be normalized, so that each parameter is in the same magnitude. The formula is as follows:

wherein x is^*Representing input data x for each dimension_iNormalized value, x_minAnd x_maxThe minimum and maximum values in each dimension of data. After the raw data is normalized, the data is mapped to [0,1 ]]In the meantime. After the model training is completed to obtain the result, the data needs to be subjected to inverse normalization processing so as to facilitate the analysis of the result.

1.4 screening auxiliary variables by PCA

PCA can solve the problem of screening auxiliary variables, and the principle is to convert a group of vectors with correlation into a group of linearly uncorrelated vectors through orthogonal transformation, and the converted group of vectors retains the main characteristics of the original vectors and has smaller dimension than the original vectors. The calculation steps are as follows:

there are k sample points, X, in n-dimensional space_i＝{x_i1,x_i2,x_i3,…,x_in}^T(i-1, 2, …, k) form a matrix X-X₁,X₂,X₃,…,X_kThe mean of the matrix is:

performing centering processing on the matrix X to obtain a matrix Y which is as follows:

the covariance matrix C of the matrix X is obtained as:

calculating the eigenvalue of the covariance matrix C, and calculating the eigenvalue lambda_i(i ═ 1,2, …, k) sorted by size, yielding the eigenvector matrix corresponding to the eigenvalues:

V＝{v₁,v₂,v₃,…,v_k}

projecting the original features onto the selected feature vectors to obtain the projection Y of Y in the ith principal component direction_i：

y_i＝v_i ^TY

Projecting in the first p (p < n) directions yields γ:

if γ is close to 1, it indicates that the dimensionality reduction of the data is achieved.

The dimensionality reduction of auxiliary variables can be realized through the steps, and the COD and the water inflow of the inflow water can be realized

And water inlet TP, water inlet TN, water inlet SS, water temperature and PH value are used as input variables of the PCA algorithm. And reducing the dimension of the data by adopting a principal component analysis method, eliminating data noise redundancy in a dimension reduction mode by calculating the accumulated contribution rate of the auxiliary variable, and simplifying the complexity of the original data.

Taking effluent COD as a leading variable, carrying out dimensionality reduction treatment on an auxiliary variable according to the steps of a PCA algorithm, wherein the contribution rate after treatment is shown in the following table:

TABLE 1 auxiliary variable contribution ratio

And selecting a variable with the accumulated contribution rate exceeding 85% as an auxiliary variable, and reducing the data from the original seven-dimension to the four-dimension. Finally, when the effluent COD is determined as the leading variable, the COD of the inlet water and the COD of the inlet water are selected

The feed water TP and the feed water TN are used as auxiliary variables.

1.5 partitioning training set and data set

In order to train the model better and improve the generalization ability of the model, the data set needs to be processed and divided into a training set and a test set. In this example, the data set is partitioned using a cross-validation method. In the embodiment, a 10-fold cross validation method is adopted, 500 groups of historical sewage data are randomly divided into 10 equal parts, and the data distribution consistency is kept in the dividing process. During training, 9 parts of data are selected as a training set of the model each time, and the rest 1 part is used as a test set. And repeating the training for 10 times, and taking the average value of the evaluation indexes of the 10 times of training.

PSO-LSTM-att neural network

2.1 PSO Algorithm

The PSO algorithm is a random search algorithm based on group coordination developed by simulating foraging behavior of a bird group, and is generally considered to be an intelligent type of cluster. In PSO, the solution to each optimization problem is a bird in the search space, called a "particle". All the particles have an adaptive value determined by an optimized function, the moving direction and distance of the particles are determined by the speed, and then the particles follow the current optimal particles to search in a solution space.

The flow of the PSO algorithm for optimizing the neural network hyper-parameters is shown in fig. 2, and includes the following steps:

(1) a training set is input.

(2) And setting the hyper-parameter of the PSO algorithm.

(3) And (4) randomly initializing the population, and selecting the Mean Square Error (MSE) of the training set as a fitness function.

(4) And inputting the population parameters into an LSTM-att neural network prediction model for training, calculating the fitness of the individual and the population, and continuously updating a PSO operator, namely updating the optimal value and the population optimal value of the iterative individual, the speed and the displacement of the particles, the inertia weight and the learning factor.

(5) And judging whether the training is finished according to whether the termination condition is reached.

(6) And if the training is finished, outputting the optimal hyperparameter of the LSTM-att. Otherwise, returning to the step (4) to continue training.

Preferably, the termination condition is that a maximum number of iterations is reached or that the mean square error is less than a set training error. The mean square error is calculated from the model predicted value (effluent COD) and the actual value, and the upper limit is set. The mean square error can reflect both the change in the fitness function of the PSO algorithm and the loss function of the LSTM neural network.

2.2 attention mechanism

Although the LSTM neural network can store historical information, in the face of huge multidimensional multi-variable data sets, some important timing information in the current time step may be ignored, so an attention mechanism is introduced to optimize the LSTM neural network and learn the importance of the local feature sequence of the data, specifically: the information at different time steps is processed, and extra weight is distributed to parameters (auxiliary variables and output of the previous time step) input into the neural network, so that the neural network can pay attention to the parameters influencing the prediction result in a training process, the interference of the parameters which are not obviously influenced at the current time step on the prediction result is reduced, and the precision of the prediction result is improved. The essence of the attention mechanism is that the human brain is simulated to focus attention on some parts of things when observing a certain thing, the information has a strong guiding effect on the cognition of the same kind of things, and the attention mechanism can be simply understood as a weighted summer which generally acts on a hidden layer of an LSTM neural network. The implementation of the attention mechanism requires adding a full connection layer after the LSTM neural network hidden layer, and setting the softmax function as the activation function. The formula for the attention mechanism is as follows:

α_i＝a(h_i)＝σ(Wh_i+b)

where W is the weight matrix of the full connection layer, h_iThe feature vector is output by the LSTM neural network, b is a full-connection layer bias matrix, and sigma is a sigmoid function; vector c is the extracted key features, m is the sum of the time steps input into the LSTM neural network; beta is a_iIs a vector h_iThe weight of (2).

2.3 establishing a PSO-LSTM-att neural network water quality prediction model

The structure diagram of the PSO-LSTM-att neural network is shown in fig. 3, and includes three parts, a PSO algorithm layer, an LSTM neural network layer, and a full connection layer (Attention mechanism). The attention mechanism is added into the hidden layer of the LSTM through the full connection layer, the number of neurons of the hidden layer of the LSTM layer and the learning rate are set as population parameters of a PSO algorithm, and a training set sample is input into the LSTM-att neural network through the PSO layer for training. And after training is finished, outputting the optimal hyper-parameter of the LSTM-att neural network, inputting the optimal hyper-parameter into the PSO-LSTM-att neural network, and verifying the prediction effect of the PSO-LSTM-att neural network on the effluent quality through a test set.

The action mode of the Attention mechanism (Attention layer) can be seen from the structural diagram, the fully-connected layer is arranged, the output of the hidden layer is subjected to weighted summation calculation, extra weight is distributed to each feature, and finally the prediction result of the next time interval is obtained. The beneficial effects of the invention are as follows:

(1) and (3) aiming at the effluent COD to be predicted, adopting a Principal Component Analysis (PCA) method, eliminating data noise redundancy in a dimensionality reduction mode by calculating the accumulated contribution rate of the auxiliary variable, simplifying the complexity of the original data, and finally determining the auxiliary variable parameter taking the effluent COD as a dominant variable.

(2) And establishing a PSO-LSTM-att neural network water quality prediction model. Aiming at the problem that the neural network is difficult to select over-parameters, a PSO algorithm is added to obtain better parameters and network structures of the neural network in a mode independent of human experience; in order to improve the ability of the LSTM neural network to identify the importance degree of the data of the current time step, an attention mechanism is added, so that the purpose of accurately grasping which auxiliary variable influences the current time step more greatly can be realized, and the purpose of improving the effluent quality prediction accuracy is realized.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A sewage quality prediction method based on a long-short term memory neural network optimized by an attention mechanism comprises the following steps:

(1) determining a leading variable;

(2) selecting an auxiliary variable;

(3) establishing a water quality prediction model based on a PSO-LSTM-att neural network;

(4) predicting the quality of the sewage treatment water.

2. The method of claim 1, wherein the auxiliary variables are selected using a principal component analysis method.

3. The method of claim 2, wherein the collected historical data of the wastewater is preprocessed, including removing outliers, supplementing missing values, and normalizing the data, prior to selecting the auxiliary variable using the principal component analysis method.

4. The method of claim 1, the structure of the PSO-LSTM-att neural network comprising three parts: a PSO algorithm layer, an LSTM neural network layer and a full connection layer; and optimizing the hyper-parameters of the LSTM neural network by adopting a PSO algorithm, adding a full connection layer behind the LSTM neural network hidden layer, and introducing an attention mechanism.

5. The method as claimed in claim 4, inputting the training set sample into the LSTM-att neural network through the PSO layer for training, outputting the optimal hyper-parameter of the LSTM-att neural network after the training is completed, and inputting the optimal hyper-parameter into the PSO-LSTM-att neural network as the water quality prediction model.

6. The method of claim 4, the optimizing the hyper-parameters of the LSTM neural network using the PSO algorithm comprising the steps of:

(1) inputting a training set;

(2) setting a hyper-parameter of a PSO algorithm;

7. The method of claim 6, wherein the termination condition is that a maximum number of iterations is reached or a mean square error is less than a set training error.

8. The method of claim 6, the PSO operators being individual and population optima, velocity and displacement of particles, inertial weights, and learning factors.

9. The method of claim 4, wherein the attention mechanism is to perform a weighted summation calculation on the output of the hidden layer, and assign an additional weight to each feature, thereby obtaining a prediction result of the next time interval.

10. The method of claim 9, wherein the attention mechanism is calculated as follows:

α_i＝a(h_i)＝σ(Wh_i+b)