CN111027776A

CN111027776A - Sewage treatment water quality prediction method based on improved long-short term memory LSTM neural network

Info

Publication number: CN111027776A
Application number: CN201911279211.2A
Authority: CN
Inventors: 刘晔晖; 冯骁; 夏文泽; 曲晓川; 王喆; 钱志明
Original assignee: Beijing Huazhan Huiyuan Information Technology Co Ltd
Current assignee: Beijing Huazhan Huiyuan Information Technology Co Ltd
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-04-17

Abstract

The invention discloses a sewage treatment water quality prediction method based on an improved long-short term memory (LSTM) neural network, which comprises the following steps of: 1) processing data; 2) establishing and training an improved long-short term memory (LSTM) neural network prediction model; 3) and importing the data to be predicted into an overall improved long-short term memory (LSTM) neural network prediction model, and outputting to obtain a predicted water quality value. The invention firstly effectively cleans and combs the data source, removes interference factors and noise points and clarifies the relation among data, and makes early work for data analysis. During data analysis, the improved LSTM network can well analyze large-time-lag and strong-coupling data, can accurately map a numerical relation of a long time span, and provides great convenience for sewage quality prediction.

Description

Sewage treatment water quality prediction method based on improved long-short term memory LSTM neural network

Technical Field

The invention belongs to the field of sewage treatment, and particularly relates to a sewage treatment water quality prediction method based on an improved long-short term memory (LSTM) neural network.

Background

A Recurrent Neural Network (RNN) is a Neural Network used to process sequence data. Compared with a general neural network, the neural network can process data with sequence variation. For example, if the meaning of a word is different because of the difference in the above-mentioned contents, the RNN can solve such a problem well. The RNN has the characteristics that the RNN can not only operate the space dimension, but also operate the time sequence dimension, the structure characteristic of the RNN is that one-step feedback operation is added on a hidden layer, and the structure can enable the input content of the RNN to not only contain the data of the input layer, but also contain the data fed back by the hidden layer at the last moment, and the RNN is also a main reason for the time sequence characteristic. However, the structure has a fatal problem, namely the problem of abnormal gradient of the structure is accompanied; since the application of the self-feedback loop accelerates the occurrence of the gradient vanishing phenomenon, and in order to avoid this phenomenon, a Long Short Term Memory Network (LSTM) was developed.

The LSTM network can effectively solve the problem that the RNN network has gradient disappearance caused by network defects, namely, long-time content cannot be recorded, and the LSTM network realizes the characteristic of recording content for a long time by adding a forgetting gate and improved structures such as an input gate, an updating gate and an output gate. However, even if the characteristics exist, the method cannot be applied to sewage treatment analysis because the sewage treatment process is a large-scale flow industrial process and has the characteristics of large time variation, strong coupling, large time lag, serious interference and the like; meanwhile, biochemical reaction in the sewage treatment process is complex, and a lot of variables are involved in the operation process, so that the large-time-span data can be greatly memorized when the data is analyzed, and at the moment, the LSTM can not meet the application requirement, so that the LSTM neural network needs to be improved and adjusted to meet the requirement of data analysis in the sewage treatment process.

Disclosure of Invention

Aiming at the defects of the existing sewage treatment process characteristics and the existing sewage treatment water quality analysis method, the invention aims to provide a high-precision analysis method based on neural network analysis, namely a sewage treatment water quality prediction method based on an improved long-short term memory (LSTM) neural network.

In order to achieve the purpose, the invention adopts the technical scheme that:

a sewage treatment water quality prediction method based on an improved long-short term memory (LSTM) neural network comprises the following steps:

1) data processing:

preprocessing the sewage treatment reported data, performing PCA (principal component analysis) dimensionality reduction on the preprocessed data to obtain result data, selecting a main parameter corresponding relation of the result data by utilizing a decision tree and pruning operation, and establishing a parameter mapping relation table;

2) establishing and training an improved long-short term memory (LSTM) neural network prediction model:

determining input and output parameters of the improved long-short term memory LSTM neural network according to the parameter mapping relation table, establishing an improved long-short term memory LSTM neural network prediction model, and training to obtain an overall improved long-short term memory LSTM neural network prediction model;

the improved long-short term memory LSTM neural network comprises an input layer, a front-end network parallel layer, an LSTM network layer, a rear-end network serial layer and an output layer;

3) and importing the data to be predicted into an overall improved long-short term memory (LSTM) neural network prediction model, and outputting to obtain a predicted water quality value.

Preferably, the preprocessing is to remove data outside the unreasonable limit value in the sewage treatment reported data, then perform 3-time standard deviation processing on the reported data, remove data with the difference between the maximum value or the minimum value of the reported data and the mean value exceeding 3-time standard deviation, perform integral dimensionality reduction operation on the sorted data, and remove redundant information in the data to obtain the result data.

Preferably, an input layer in the LSTM improved neural network model is a docking interface of a data parametric input network, and a data set X is divided into n parts according to a data inclusion time length, that is:

X＝[x(1),x(2),...,x(n)]

i.e. a data of a long time span is divided into n data of a relatively short time span, and it is noted that the value of the fraction n that the network divides the data set is larger as the time span of the data set is longer.

Preferably, the front-end network parallel layer includes a parallel form of multiple groups of fully-connected neural networks and dropout layers, and the weight value between a network node i and a network node j in the multiple groups of fully-connected neural networks is w_ijThe threshold value of the node j is b_jThe output value of each node is x_jThe output value of each node is determined by the activation function f according to the output values of all nodes on the upper layer, the weights and biases of the current node and all nodes on the upper layer, and the effect is as follows:

x_j＝f(S_j)

after the output of the fully-connected network is finished, the method enters a dropout network hierarchical structure for processing, and the method specifically comprises the following steps:

the Bernoulli function in the above formula randomly generates a vector of 0 and 1 with probability p.

Preferably, the LSTM network layer is a main data processing layer in the whole improved LSTM neural network and is composed of a plurality of LSTM networks connected in parallel, wherein the current input x^tAnd c passed down from the previous state^t-1、h^t-1The common determination is as follows:

wherein z is^f,zⁱ,z⁰After the splicing vector is multiplied by the weight matrix, the splicing vector is converted into a value between 0 and 1 through a sigmoid activation function to serve as a gating state, and z is the value between-1 and 1 obtained by converting the result through a tanh activation function:

y^t＝δ(W'h^t)

y_seq＝y^t

is the multiplication of the corresponding elements in the matrix,

matrix addition is performed.

Preferably, the back-end network serial layer includes multiple sets of fully-connected neural networks and dropout layers, and after receiving the LSTM network layer result, the back-end network serial layer needs to accumulate and sum the parallel results to obtain a unique solution, and then sends the unique solution to the back fully-connected and dropout network structure layer for processing, and the specific effects are as follows:

the method firstly carries out numerical value limit processing on the reported data of the sewage treatment, and removes the data outside the unreasonable limit. It should be noted that, since the sewage data are reported in a time logic sequence, the data under the same timestamp represent the water quality condition at that time, that is, when an index in the water quality is removed due to a failure, other data at that time should also be removed, so as to ensure the accuracy of the time logic of the data. After the limit value processing, 3 times of standard deviation processing is needed to be carried out on the data, because the distribution condition of the reported data of the sewage treatment is just too distributed, the probability of existence of the numerical value exceeding 3 times of standard deviation is about 0.3%, and when the difference between the maximum value (or the minimum value) and the mean value of the reported data exceeds 3 times of standard deviation, the extreme values with problems in the maximum value (or the minimum value) and the mean value are likely to have problems, so that the extreme values with problems in the maximum value (or the minimum value) and the mean value are removed to.

The sewage treatment process has the characteristics of large time variation, strong coupling, large time lag, serious interference, complex biochemical reaction and the like, so that the reported data quality is poor, the water quality prediction difficulty is high, and the problems can be well overcome by the method. The method firstly effectively cleans and combs the data source, removes interference factors and noise points, and makes clear the relation between data, so as to make early work for data analysis. During data analysis, the improved LSTM network can well analyze large-time-lag and strong-coupling data, can accurately map a numerical relation of a long time span, and provides great convenience for sewage quality prediction.

The invention has the beneficial effects that:

1) the improved long-short term memory LSTM network can well analyze large-time-lag and strong-coupling data, can accurately map the numerical relationship of a long time span, and provides great convenience for sewage quality prediction;

2) the invention has high analysis and prediction precision.

Drawings

FIG. 1 is a diagram of a sewage prediction method architecture;

FIG. 2 is a decision tree relationship resolution graph, which is a tree form graph after pruning, a decision tree overall graph and a corresponding graph of error effect after pruning;

FIG. 3 is a diagram of an improved Long Short Term Memory (LSTM) neural network;

FIG. 4 wastewater quality NH₄And a COD prediction effect graph.

Detailed Description

In order to further illustrate the technical effects of the present invention, the present invention is specifically described below by way of examples.

Example 1

The invention discloses a sewage treatment water quality prediction method based on an improved long-short term memory (LSTM) neural network, which is operated as shown in figure 1, and has two major aspects and five parts in total. Taking the CAST sewage treatment process as an example, the CAST sewage treatment process reports a great deal of data, such as inflow rate, inflow PH value, inflow COD, inflow ammonia nitrogen, inflow temperature, CAST pool ss, CAST pool do value, CAST pool total air volume, CAST pool fan current total amount, output water quality ammonia nitrogen, output water quality ss, output water quality COD, output water quality total phosphorus, output water quality total nitrogen and the like. The overall pretreatment is carried out on the data reported by the sewage treatment systems, and the specific operation steps are as follows.

Step 1: firstly, numerical limit value processing is carried out on reported data of sewage treatment, and data outside unreasonable limit values are removed. It should be noted that, since the sewage data are reported in a time logic sequence, the data under the same timestamp represent the water quality condition at that time, that is, when an index in the water quality is removed due to a failure, other data at that time should also be removed, so as to ensure the accuracy of the time logic of the data.

Step 2: after the limit value processing, 3 times of standard deviation processing is needed to be carried out on the data, because the distribution condition of the reported data of the sewage treatment is just too distributed, the probability of existence of the numerical value exceeding 3 times of standard deviation is about 0.3%, and when the difference between the maximum value (or the minimum value) and the mean value of the reported data exceeds 3 times of standard deviation, the extreme values with problems in the maximum value (or the minimum value) and the mean value are likely to have problems, so that the extreme values with problems in the maximum value (or the minimum value) and the mean value are removed to.

And step 3: performing PCA (principal component analysis) dimension reduction on the processed data, and removing redundant information in the data, wherein the dimension reduction method is to establish a data matrix X firstly as follows:

firstly, carrying out standardization processing on an X matrix:

wherein the content of the first and second substances,

then, a correlation coefficient matrix is calculated, and the method is as follows:

computingEigenvalues (λ) of the correlation coefficient matrix R₁,λ₂,., λ p) and the corresponding feature vector:

a_i＝(a_i1,a_i2,Λ,a_ip),i＝1,2,Λ,p

selecting important principal components, selecting the first k principal components according to 95% of the actual total information amount, removing 5% of small probability event components, selecting a proportion probability that a certain characteristic value accounts for the total of all the characteristic values, and obtaining a matrix after PCA dimension reduction by using the remaining k principal components.

And 4, step 4: after data processing is finished, the causal relationship among the data also needs to be combed, and due to the sewage process, certain result water quality indexes are the factors of other result parameters, so that the clear causal relationship of the data is obtained. And selecting the main parameter corresponding relation of the result data by utilizing the decision tree and the pruning operation, and establishing a parameter mapping relation table.

Before the decision tree is operated, the result parameters serving as the classification need to be clustered, that is, the result parameters are clustered into 5 classes according to the value domain distribution condition.

And classifying the data by using the information gain rate, and solving the probability of the result which can be covered by classification, wherein the information gain rate formula is solved by the information gain and the splitting information quantity, and the following steps are shown:

wherein the information gain G (S, a) is defined as:

e (S) is the entropy of the data set S, P_iIs the proportion of the number of samples of the ith attribute value in the subset, S_vThe feature A in the set S takes the v-th value to obtain a sample subset.

The split information quantity is defined as:

S_iis a subset of samples in S that belong to the i-th class.

Finally, the post pruning operation of the decision tree is utilized to delete the node branches and replace the node branches with leaf sub-nodes, the leaf nodes are generally marked by the maximum probability category replacement in the subset, so that the main decision factors of the category are found out, a parameter mapping relation table is compiled, and the result form is shown in fig. 2.

And 5: and confirming that the input and output parameters of the improved LSTM neural network are put into the network according to the established parameter mapping relation table. The improved LSTM neural network consists of a plurality of network levels, namely an input layer, a front-end network parallel layer, an LSTM network layer, a rear-end network serial layer and an output layer. Wherein the input layer X represents the input data parameter data set, and Y represents the output result parameter, the specific form is shown in fig. 3.

1) The input layer is a butt joint interface of a data parameter input network, firstly, a data set X is divided into n parts according to the time length of data inclusion, namely:

X＝[x(1),x(2),...,x(n)]

one data of a long time span is divided into n data of a relatively short time span, and it is noted that the score n of the data set divided by the network is larger as the data set time span is longer. During the actual operation, according to the size of the reported data set and the water quality analysis, the reported data of the water plant is one year, and the change of four seasons in one year can cause the water quality parameters to have larger jump, so the integral input value is divided into 4 parts, namely the n value is set as 4.

2) The front-end network parallel layer mainly comprises a parallel form consisting of a plurality of groups of fully-connected neural networks and a dropout layer, wherein the weight value between a network node i and a network node j is set as w_ijThe threshold value of the node j is b_jThe output value of each node is x_jAnd the output value of each node is based on the aboveThe output values of all nodes in the layer, the weights and the biases of the current node and all nodes in the previous layer are jointly determined by an activation function f (Relu type activation function), and the effects are as follows:

x_j＝f(S_j)

wherein the activation function ReLU is as follows:

after the output of the fully connected network is completed, the method also needs to enter a dropout network hierarchical structure for processing, which specifically includes the following steps:

3) The LSTM network layer is the main data processing layer in the whole network and is composed of a plurality of LSTM networks connected in parallel, wherein the current input x^tAnd c passed down from the previous state^t-1、h^t-1The common determination is as follows:

wherein z is^f,zⁱ,z⁰After the splicing vector is multiplied by the weight matrix, the value is converted into a value between 0 and 1 through a sigmoid activation function to serve as a gating state. And z is the value between-1 and 1 that the result will be converted by a tanh activation function (tanh is used here because it is used as input data, not as a gating signal).

y^t＝δ(W'h^t)

y_seq＝y^t

Is the multiplication of the corresponding elements in the matrix,

matrix addition is performed.

4) The content of the serial layer of the back-end network is very similar to that of the parallel layer of the front-end network in structure, except that after receiving the results of the LSTM network layer, the parallel results need to be accumulated and summed to obtain a unique solution, and then the unique solution is sent to the full-connection and dropout network structure layer for processing, and the specific effects are as follows:

5) it should be further noted that some hyper-parametric designs in the network result, in the network structure, the initial learning rate is defined as 0.01, the Mini batch size is defined as 400, the optimizer selects adam, the gradient threshold is 1, and the front and rear fully-connected layers are defined as 3 layers.

6) Finally, the network can obtain an overall prediction network after training, then the water quality value at the next moment is predicted, known data are copied and divided into n parts to be put into the network for data prediction, and finally the predicted water quality value is obtained, and the effect is shown in fig. 4.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting, and although the technical solutions of the present invention are described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the present invention, which should be covered by the protection scope of the present invention.

Claims

1. A sewage treatment water quality prediction method based on an improved long-short term memory (LSTM) neural network is characterized by comprising the following steps:

1) data processing:

2. The method for predicting the quality of the sewage treatment water according to claim 1, wherein the pretreatment is to remove the data outside the unreasonable limit value in the reported data of the sewage treatment, then perform 3 times standard deviation treatment on the reported data, and remove the data of which the difference between the maximum value or the minimum value of the reported data and the mean value exceeds 3 times standard deviation.

3. The method according to claim 1, wherein the input layer is a docking interface of a data parameter input network, and the data set X is divided into n parts according to the data-containing time span, that is, one data of a long time span is divided into n data of a relatively short time span:

X＝[x(1),x(2),...,x(n)]。

4. the wastewater treatment water quality prediction method according to claim 1, wherein the front-end network parallel layer comprises a parallel connection form of a plurality of groups of fully-connected neural networks and a dropout layer, and a weight value between a network node i and a network node j in the plurality of groups of fully-connected neural networks is w_ijThe threshold value of the node j is b_jThe output value of each node is x_jThe output value of each node is determined by the activation function f according to the output values of all nodes on the upper layer, the weights and biases of the current node and all nodes on the upper layer, and the effect is as follows:

x_j＝f(S_j)

5. The wastewater treatment water quality prediction method of claim 1, wherein the LSTM network layer is a main data processing layer in the whole modified LSTM neural network and is composed of a plurality of LSTM networks connected in parallel, wherein the current input x is^tAnd c passed down from the previous state^t-1、h^t-1The common determination is as follows:

y^t＝δ(W'h^t)

y_seq＝y^t

is the multiplication of the corresponding elements in the matrix,

matrix addition is performed.

6. The sewage treatment water quality prediction method according to claim 1, wherein the back-end network serial layer comprises a plurality of sets of fully-connected neural networks and dropouts, and after receiving the LSTM network layer results, the back-end network serial layer needs to accumulate and sum the parallel results to obtain a unique solution, and then sends the unique solution to the back fully-connected and dropouts network structure layer for processing, and the specific effects are as follows:

。