CN117035155A

CN117035155A - Water quality prediction method

Info

Publication number: CN117035155A
Application number: CN202310763846.XA
Authority: CN
Inventors: 苟瀚文; 左鸿宇; 刘富銘; 苟先太
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2023-06-26
Filing date: 2023-06-26
Publication date: 2023-11-10

Abstract

The invention discloses a water quality prediction method, which comprises the following steps: preprocessing the historical water quality index data to obtain an original water quality index sequence; decomposing an original water quality index sequence by a complete set of empirical mode decomposition (CEEMDAN); respectively inputting the decomposed water quality index data into a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for training, and respectively obtaining predicted values of the two models; comparing the predicted values of the two models to obtain an optimal configuration model, and obtaining the predicted value of each modal component IMF by using the optimal configuration model; and superposing the predicted value of each modal component IMF to obtain predicted original water quality index data. The invention solves the problems of time sequence interruption in water quality index data and insufficient accuracy of a water quality prediction model caused by the extension of the time sequence.

Description

Water quality prediction method

Technical Field

The invention belongs to the field of water quality monitoring, and particularly relates to a water quality prediction method.

Background

The water quality prediction is to establish a water quality prediction model according to the historical water quality data sequence so as to predict the change trend of the future water quality index. The water quality prediction is an important way for changing the water pollution from post treatment to pre prevention as the basic work of protecting water resources. In the field of water pollution control, the water quality prediction result can be used as a basis for making a water pollution control scheme. The higher the accuracy of water quality prediction, the greater the effect on water pollution control, so that the establishment of a scientific and effective water pollution control scheme is particularly important to improve the accuracy of water quality prediction.

The empirical mode decomposition EMD solves the problems of end-point effect, modal aliasing and the like, and the complete set empirical mode decomposition method CEEMDAN is an improved method for decomposing EEMD based on the set empirical mode. The autoregressive integrated moving average model ARIMA is one of the most widely used methods for predicting univariate time series data, and can reduce the calculation workload on the aspect of processing uncomplicated data, and can predict the data more accurately under the condition of obvious sequence trend. The long-term memory neural network LSTM is used as a special cyclic neural network RNN, so that the problem of long-term dependence in the cyclic neural network RNN is solved, and the optimal solution can be quickly converged to obtain on the prediction of time sequence.

Disclosure of Invention

The invention provides a water quality prediction method, which solves the problems of time sequence interruption in water quality index data and insufficient accuracy of a water quality prediction model caused by the extension of a time sequence.

In order to solve the technical problems, the technical scheme of the invention is as follows: a water quality prediction method, comprising the steps of:

s1, acquiring historical water quality index data of a monitoring station, and preprocessing the historical water quality index data to obtain an original water quality index sequence;

s2, decomposing an original water quality index sequence through a complete set empirical mode decomposition (CEEMDAN) method to obtain decomposed water quality index data;

s3, respectively inputting the decomposed water quality index data into a long-short-period memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for training, and respectively obtaining predicted values of the long-short-period memory neural network LSTM and the autoregressive comprehensive moving average model ARIMA;

s4, comparing predicted values of the long-term memory neural network LSTM and the autoregressive integrated moving average model ARIMA to obtain an optimal configuration model, predicting water quality indexes by using the optimal configuration model, and outputting predicted values of each modal component IMF;

s5, superposing the predicted value of each modal component IMF to obtain predicted final water quality index data.

The beneficial effects of the invention are as follows: the invention provides a water quality prediction method, which is characterized in that a water quality data set is affected by industrial pollution, human social activities, river inflow and sea runoff, climate transformation and other comprehensive factors, so that CEEMDAN decomposition is utilized to obtain different modal components IMF, prediction results of the different modal components IMF are respectively obtained by using prediction models obtained by training based on a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA, and the optimal model is selected to overlap the prediction results of the modal components IMF, thereby obtaining predicted final water quality index data, and solving the problems of time sequence interruption in the water quality index data and insufficient accuracy of a water quality prediction model due to the prolongation of a time sequence.

Further, the step S1 specifically includes the steps of:

s11, screening out abnormal values by using a KNN algorithm according to the acquired historical water quality index data of the monitoring station, and deleting the abnormal values from the data set;

s12, establishing a linear relation based on the missing value in the data set and two actual historical water quality index data adjacent to the missing value left and right, calculating a data increment by using the slope of the assumed straight line to obtain data for filling the missing value, and supplementing the missing value;

s13, selecting a historical data set according to the historical water quality index data subjected to abnormal value and missing value processing and the time interval range, and obtaining an original water quality index sequence.

The beneficial effects of the above-mentioned further scheme are: after the time interval is processed, the invention ensures that the interval of each original water quality index sequence data is kept consistent, and ensures that the original water quality index sequence data has continuity.

Further, the formula for establishing the linear relationship in the step S12 is:

wherein, (x) ₀ ,y ₀ ) And (x) ₁ ,y ₁ ) Two coordinates adjacent to the determined missing value are indicated, and (x, y) indicates the determined missing value.

Further, the step S2 specifically includes the steps of:

s21, adding Gaussian white noise with the mean value of 0K times to an original water quality index sequence x (t) to obtain K' sequences to be decomposed;

s22, performing Empirical Mode Decomposition (EMD) on the K ' sequences to be decomposed to obtain first modal components of the K ' sequences to be decomposed respectively, and taking the average value of the first modal components of the K ' sequences to be decomposed to obtain first modal components IMF after CEEMDAN decomposition by the complete set empirical mode decomposition method ₁ (t) and a first residual signal;

s23, after adding specific noise to the decomposed jth residual signal, judging whether the decomposed jth residual signal is a monotone signal, if so, obtaining decomposed water quality index data according to the jth modal component and the jth residual signal, otherwise, performing CEEMDAN decomposition by a next complete set empirical mode decomposition method.

The beneficial effects of the above-mentioned further scheme are: the invention uses the CEEMDAN of the completely integrated empirical mode decomposition method to avoid the problems of end-point effect, modal aliasing and the like when the EMD is decomposed by using the empirical mode alone.

Further, the expression of the sequence to be decomposed in the step S21 is:

x _i (t)＝x(t)+εσ _i (t)

wherein x is _i (t) represents the ith sequence to be decomposed, ε represents the Gaussian white noise weight coefficient, σ _i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, and x (t) represents an original water quality index sequence.

Further, the first modal component IMF after decomposition in step S22 ₁ The expression of (t) and the first residual signal is:

r ₁ (t)＝x(t)-IMF ₁ (t)

wherein, IMF ₁ (t) represents the first modal component after decomposition, K' represents the number of sequences to be decomposed, IMF ₁ ⁱ (t) represents the first modal component of the ith sequence to be decomposed, r ₁ (t) represents the first residual signal after decomposition, and x (t) represents the original water quality index sequence.

Further, the expression of the j-th modal component and the j-th residual signal after decomposition in the step S23 is:

r _j (t)＝r _j-1 (t)-IMF _j (t)

wherein, IMF _j (t) represents the j-th modal component after decomposition, K' represents the number of sequences to be decomposed, E ₁ () Representing the first modal component, σ, of the sequence after EMD decomposition _i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, E _j-1 (σ _i (t)) represents the sum of the values of sigma _i (t) performing Empirical Mode Decomposition (EMD) on the j-1 th modal component after the EMD decomposition,σ _i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, r _j (t) represents the j-th residual signal, r _j-1 (t) represents the j-1 th residual signal, ε _j-1 The weight coefficient indicating the addition of noise to the j-1 th residual signal.

Further, the step S3 obtains a predicted value of the long-short-term memory neural network LSTM, which specifically includes the steps of:

a1, performing data preprocessing and normalization operation according to the decomposed water quality index data;

a2, dividing the water quality index data processed in the step A1 into a first training set and a first testing set;

a3, creating and fitting a long-term and short-term memory neural network LSTM, and selecting an optimizer;

a4, training the long-term memory neural network LSTM by using a training set, and carrying out back propagation by using a gradient descent method to update parameters of the long-term memory neural network LSTM;

and A5, inputting the test set into the trained long-short-period memory neural network LSTM to obtain the predicted value of the long-short-period memory neural network LSTM.

Further, the step S3 is to obtain a predicted value of the autoregressive integrated moving average model ARIMA, which specifically includes the steps of:

b1, dividing the decomposed water quality index data into a second training set and a second testing set;

b2, carrying out stability test on data in the training set, and determining a differential coefficient d of an autoregressive integrated moving average model ARIMA;

b3, determining an autoregressive term number p and a moving average term number q in an autoregressive comprehensive moving average model ARIMA through an autocorrelation coefficient ACF and a partial autocorrelation coefficient PACF of the time sequence;

and B4, inputting the test set into an autoregressive integrated moving average model ARIMA of the determined parameters to obtain a predicted value of the autoregressive integrated moving average model ARIMA.

Further, the step S4 specifically includes the steps of:

s41, comparing predicted values of each modal component IMF in a long-term memory neural network LSTM and an autoregressive integrated moving average model ARIMA;

s42, respectively selecting an optimal precision model in a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for each modal component IMF according to the comparison result to obtain an optimal configuration model;

s43, inputting the decomposed water quality index data into an optimal configuration model, and outputting the predicted value of each modal component IMF.

The beneficial effects of the above-mentioned further scheme are: the autoregressive integrated moving average model ARIMA is one of the most widely used methods for predicting univariate time series data, and can reduce the calculation workload on the aspect of processing uncomplicated data, and can predict the data more accurately under the condition of obvious sequence trend. The long-term memory neural network LSTM is used as a special cyclic neural network RNN, so that the problem of long-term dependence in the cyclic neural network RNN is solved, and the optimal solution can be quickly converged to obtain on the prediction of time sequences. According to the invention, the optimal configuration model is obtained according to the long-term and short-term memory neural network LSTM and the autoregressive integrated moving average model ARIMA, the prediction effect is higher than that of any single model, and the advantages of the autoregressive integrated moving average model ARIMA and the long-term memory neural network LSTM are provided.

Drawings

FIG. 1 is a flow chart of the water quality prediction method of the present invention.

FIG. 2 shows the decomposition result of CEEMDAN based on the complete set empirical mode decomposition method.

FIG. 3 is a graph showing the comparison of the predicted result and the actual value of the ammonia nitrogen water quality index in the invention.

Detailed Description

Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Examples

As shown in FIG. 1, the invention provides a water quality prediction method, which comprises the following steps:

The step S1 specifically comprises the following steps:

In this embodiment, relevant historical water quality index data of the VM1 monitoring station is selected, and ammonia nitrogen index is taken as an example, so as to obtain ammonia nitrogen data of each month in 29 years, and the data set is preprocessed.

The formula for establishing the linear relationship in the step S12 is as follows:

The step S2 comprises the following specific steps:

The expression of the sequence to be decomposed in the step S21 is:

x _i (t)＝x(t)+εσ _i (t)

The first modal component IMF after decomposition in step S22 ₁ The expression of (t) and the first residual signal is:

r ₁ (t)＝x(t)-IMF ₁ (t)

The expression of the j-th modal component and the j-th residual signal after decomposition in the step S23 is:

r _j (t)＝r _j-1 (t)-IMF _j (t)

wherein, IMF _j (t) represents the j-th modal component after decomposition, K' represents the number of sequences to be decomposed, E ₁ () Representing the first modal component, σ, of the sequence after EMD decomposition _i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, E _j-1 (σ _i (t)) represents the sum of the values of sigma _i (t) the j-1 th modal component after EMD decomposition, σ _i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, r _j (t) represents the j-th residual signal, r _j-1 (t) represents the j-1 th residual signal, ε _j-1 The weight coefficient indicating the addition of noise to the j-1 th residual signal.

In the embodiment, the original water quality index data is decomposed by adopting a complete set empirical mode decomposition method CEEMDAN, and through multiple tests, the original water quality index data with larger fluctuation has more ideal decomposition effect when the variance of added noise is about 0.5 and the noise number is about 100. Thus, in this embodiment, the complete set empirical mode decomposition method CEEMNAN decomposition with a noise variance of 0.5 and a noise number of 120 is used to decompose all the original water quality index sequences. Taking the water quality data of the water quality index ammonia nitrogen as an example, after the original water quality index sequence of the ammonia nitrogen is decomposed by a CEEMDAN (complete set empirical mode decomposition) method, seven modal components IMF and a residual signal are obtained.

As shown in fig. 2, from the decomposition result of the CEEMDAN of the complete set empirical mode decomposition method, the first three modal components IMF have larger fluctuation, and the fluctuation and non-stationarity of the last four modal components IMF are greatly reduced. The residual signal represents the error of the original water quality index sequence of ammonia nitrogen after being decomposed by the CEEMDAN through a complete set empirical mode decomposition method, and is mainly used for detecting the decomposition performance, and the residual signal decomposed by the embodiment is basically zero, so that the decomposition effect is good.

The predicted value of the long-short-term memory neural network LSTM is obtained in the step S3, and the specific steps are as follows:

The step S3 is to obtain the predicted value of the autoregressive integrated moving average (ARIMA), which comprises the following specific steps:

In this embodiment, each modal component IMF data obtained by CEEMDAN decomposition by the full set empirical mode decomposition method is divided. The first 80% of the data was taken as training data for both models, while the last 20% was taken. The latter 20% of data does not participate in the training process of the two prediction models, but can be used as test data to test the accuracy of the two prediction models respectively.

And respectively inputting the test data into a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA to obtain predicted values of the modal components IMF. Calculating accuracy of predicting each modal component IMF by using the long-short-term memory neural network LSTM and the autoregressive integrated moving average model ARIMA, preferentially selecting a prediction model of each modal component IMF, predicting each modal component IMF by using the prediction model preferentially selected by each modal component IMF, and overlapping prediction results to obtain final water quality index prediction.

Taking ammonia nitrogen as an example, seven modal components IMFs are obtained after CEEMDAN decomposition by a complete set empirical mode decomposition method, and then the first 80% of the data of the seven modal components IMFs are used as a training sample set to serve as a first training set and a second training set. The first training set and the second training set are respectively input into a long-short-period memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for training, and then the rest 20% of data are used as a first test set and a second test set for respectively testing the prediction effects of the two models of the long-short-period memory neural network LSTM and the autoregressive comprehensive moving average model ARIMA.

The step S4 specifically comprises the following steps:

In this embodiment, the prediction data of the autoregressive integrated moving average model ARIMA and the long-short term memory neural network LSTM are compared with the original data, and the root mean square error RMSE, the mean absolute error MAE and the mean absolute percentage error MAPE thereof are calculated respectively. By comparison, an optimal predictive model is selected for each modal component IMF. Table 1 shows prediction errors of seven modal components IMF for ammonia nitrogen.

TABLE 1

The comparison in table 1 can obtain, for seven modal components IMFs of the original water quality index data of the water quality index ammonia nitrogen in this embodiment after the complete set empirical mode decomposition method CEEMDAN is decomposed, the modal components IMF1, IMF2 and IMF3 can be predicted by the long-short-term memory neural network LSTM, and the modal components IMF4, IMF5, IMF6 and IMF7 can be predicted by the autoregressive integrated moving average model ARIMA to obtain an optimal configuration model.

And finally, predicting the seven modal components IMF through an optimal configuration model, an autoregressive comprehensive moving average model ARIMA and a long and short term memory neural network LSTM respectively to obtain final water quality index data of ammonia nitrogen prediction, wherein RMSE represents root mean square error and MAE represents average absolute error. Table 2 shows the prediction error comparison results.

TABLE 2

In the prediction model, the prediction is performed by only using a long-short-term memory neural network LSTM or only using a full set empirical mode decomposition method ARIMA, the prediction effect is far less than that of an optimal configuration model used by combining the long-term memory neural network LSTM and the full set empirical mode decomposition method ARIMA, the prediction is performed on the basis of CEEMDAN decomposition data based on the full set empirical mode decomposition method, and the Root Mean Square Error (RMSE) of the LSTM-ARIMA model prediction model is at least 7% lower than that of a single model.

As shown in FIG. 3, the comparison of the predicted result and the true value of the ammonia nitrogen water quality index shows that the method has high precision of the predicted result and can solve the problems of time sequence interruption in the water quality index data and insufficient accuracy of a water quality prediction model caused by the prolongation of the time sequence.

Claims

1. The water quality prediction method is characterized by comprising the following steps of:

2. The water quality prediction method according to claim 1, wherein the step S1 specifically comprises the steps of:

3. The water quality prediction method according to claim 2, wherein the formula for establishing the linear relationship in step S12 is:

4. The water quality prediction method according to claim 1, wherein the step S2 specifically comprises the steps of:

5. The method according to claim 4, wherein the expression of the sequence to be decomposed in the step S21 is:

x _i (t)＝x(t)+εσ _i (t)

6. The method according to claim 4, wherein the first modal component IMF after decomposition in step S22 ₁ The expression of (t) and the first residual signal is:

r ₁ (t)＝x(t)-IMF ₁ (t)

7. The method according to claim 4, wherein the expression of the j-th modal component and the j-th residual signal after decomposition in the step S23 is:

r _j (t)＝r _j-1 (t)-IMF _j (t)

8. The water quality prediction method according to claim 1, wherein the step S3 is performed to obtain a predicted value of the long-short-term memory neural network LSTM, and the specific steps are as follows:

9. The method for predicting water quality according to claim 1, wherein the step S3 is to obtain a predicted value of an autoregressive integrated moving average model ARIMA, and comprises the following specific steps:

10. The water quality prediction method according to claim 1, wherein the step S4 specifically comprises the steps of: