CN117035155A - Water quality prediction method - Google Patents

Water quality prediction method Download PDF

Info

Publication number
CN117035155A
CN117035155A CN202310763846.XA CN202310763846A CN117035155A CN 117035155 A CN117035155 A CN 117035155A CN 202310763846 A CN202310763846 A CN 202310763846A CN 117035155 A CN117035155 A CN 117035155A
Authority
CN
China
Prior art keywords
water quality
quality index
decomposed
sequence
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310763846.XA
Other languages
Chinese (zh)
Inventor
苟瀚文
左鸿宇
刘富銘
苟先太
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN202310763846.XA priority Critical patent/CN117035155A/en
Publication of CN117035155A publication Critical patent/CN117035155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Algebra (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)

Abstract

The invention discloses a water quality prediction method, which comprises the following steps: preprocessing the historical water quality index data to obtain an original water quality index sequence; decomposing an original water quality index sequence by a complete set of empirical mode decomposition (CEEMDAN); respectively inputting the decomposed water quality index data into a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for training, and respectively obtaining predicted values of the two models; comparing the predicted values of the two models to obtain an optimal configuration model, and obtaining the predicted value of each modal component IMF by using the optimal configuration model; and superposing the predicted value of each modal component IMF to obtain predicted original water quality index data. The invention solves the problems of time sequence interruption in water quality index data and insufficient accuracy of a water quality prediction model caused by the extension of the time sequence.

Description

Water quality prediction method
Technical Field
The invention belongs to the field of water quality monitoring, and particularly relates to a water quality prediction method.
Background
The water quality prediction is to establish a water quality prediction model according to the historical water quality data sequence so as to predict the change trend of the future water quality index. The water quality prediction is an important way for changing the water pollution from post treatment to pre prevention as the basic work of protecting water resources. In the field of water pollution control, the water quality prediction result can be used as a basis for making a water pollution control scheme. The higher the accuracy of water quality prediction, the greater the effect on water pollution control, so that the establishment of a scientific and effective water pollution control scheme is particularly important to improve the accuracy of water quality prediction.
The empirical mode decomposition EMD solves the problems of end-point effect, modal aliasing and the like, and the complete set empirical mode decomposition method CEEMDAN is an improved method for decomposing EEMD based on the set empirical mode. The autoregressive integrated moving average model ARIMA is one of the most widely used methods for predicting univariate time series data, and can reduce the calculation workload on the aspect of processing uncomplicated data, and can predict the data more accurately under the condition of obvious sequence trend. The long-term memory neural network LSTM is used as a special cyclic neural network RNN, so that the problem of long-term dependence in the cyclic neural network RNN is solved, and the optimal solution can be quickly converged to obtain on the prediction of time sequence.
Disclosure of Invention
The invention provides a water quality prediction method, which solves the problems of time sequence interruption in water quality index data and insufficient accuracy of a water quality prediction model caused by the extension of a time sequence.
In order to solve the technical problems, the technical scheme of the invention is as follows: a water quality prediction method, comprising the steps of:
s1, acquiring historical water quality index data of a monitoring station, and preprocessing the historical water quality index data to obtain an original water quality index sequence;
s2, decomposing an original water quality index sequence through a complete set empirical mode decomposition (CEEMDAN) method to obtain decomposed water quality index data;
s3, respectively inputting the decomposed water quality index data into a long-short-period memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for training, and respectively obtaining predicted values of the long-short-period memory neural network LSTM and the autoregressive comprehensive moving average model ARIMA;
s4, comparing predicted values of the long-term memory neural network LSTM and the autoregressive integrated moving average model ARIMA to obtain an optimal configuration model, predicting water quality indexes by using the optimal configuration model, and outputting predicted values of each modal component IMF;
s5, superposing the predicted value of each modal component IMF to obtain predicted final water quality index data.
The beneficial effects of the invention are as follows: the invention provides a water quality prediction method, which is characterized in that a water quality data set is affected by industrial pollution, human social activities, river inflow and sea runoff, climate transformation and other comprehensive factors, so that CEEMDAN decomposition is utilized to obtain different modal components IMF, prediction results of the different modal components IMF are respectively obtained by using prediction models obtained by training based on a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA, and the optimal model is selected to overlap the prediction results of the modal components IMF, thereby obtaining predicted final water quality index data, and solving the problems of time sequence interruption in the water quality index data and insufficient accuracy of a water quality prediction model due to the prolongation of a time sequence.
Further, the step S1 specifically includes the steps of:
s11, screening out abnormal values by using a KNN algorithm according to the acquired historical water quality index data of the monitoring station, and deleting the abnormal values from the data set;
s12, establishing a linear relation based on the missing value in the data set and two actual historical water quality index data adjacent to the missing value left and right, calculating a data increment by using the slope of the assumed straight line to obtain data for filling the missing value, and supplementing the missing value;
s13, selecting a historical data set according to the historical water quality index data subjected to abnormal value and missing value processing and the time interval range, and obtaining an original water quality index sequence.
The beneficial effects of the above-mentioned further scheme are: after the time interval is processed, the invention ensures that the interval of each original water quality index sequence data is kept consistent, and ensures that the original water quality index sequence data has continuity.
Further, the formula for establishing the linear relationship in the step S12 is:
wherein, (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Two coordinates adjacent to the determined missing value are indicated, and (x, y) indicates the determined missing value.
Further, the step S2 specifically includes the steps of:
s21, adding Gaussian white noise with the mean value of 0K times to an original water quality index sequence x (t) to obtain K' sequences to be decomposed;
s22, performing Empirical Mode Decomposition (EMD) on the K ' sequences to be decomposed to obtain first modal components of the K ' sequences to be decomposed respectively, and taking the average value of the first modal components of the K ' sequences to be decomposed to obtain first modal components IMF after CEEMDAN decomposition by the complete set empirical mode decomposition method 1 (t) and a first residual signal;
s23, after adding specific noise to the decomposed jth residual signal, judging whether the decomposed jth residual signal is a monotone signal, if so, obtaining decomposed water quality index data according to the jth modal component and the jth residual signal, otherwise, performing CEEMDAN decomposition by a next complete set empirical mode decomposition method.
The beneficial effects of the above-mentioned further scheme are: the invention uses the CEEMDAN of the completely integrated empirical mode decomposition method to avoid the problems of end-point effect, modal aliasing and the like when the EMD is decomposed by using the empirical mode alone.
Further, the expression of the sequence to be decomposed in the step S21 is:
x i (t)=x(t)+εσ i (t)
wherein x is i (t) represents the ith sequence to be decomposed, ε represents the Gaussian white noise weight coefficient, σ i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, and x (t) represents an original water quality index sequence.
Further, the first modal component IMF after decomposition in step S22 1 The expression of (t) and the first residual signal is:
r 1 (t)=x(t)-IMF 1 (t)
wherein, IMF 1 (t) represents the first modal component after decomposition, K' represents the number of sequences to be decomposed, IMF 1 i (t) represents the first modal component of the ith sequence to be decomposed, r 1 (t) represents the first residual signal after decomposition, and x (t) represents the original water quality index sequence.
Further, the expression of the j-th modal component and the j-th residual signal after decomposition in the step S23 is:
r j (t)=r j-1 (t)-IMF j (t)
wherein, IMF j (t) represents the j-th modal component after decomposition, K' represents the number of sequences to be decomposed, E 1 () Representing the first modal component, σ, of the sequence after EMD decomposition i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, E j-1i (t)) represents the sum of the values of sigma i (t) performing Empirical Mode Decomposition (EMD) on the j-1 th modal component after the EMD decomposition,σ i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, r j (t) represents the j-th residual signal, r j-1 (t) represents the j-1 th residual signal, ε j-1 The weight coefficient indicating the addition of noise to the j-1 th residual signal.
Further, the step S3 obtains a predicted value of the long-short-term memory neural network LSTM, which specifically includes the steps of:
a1, performing data preprocessing and normalization operation according to the decomposed water quality index data;
a2, dividing the water quality index data processed in the step A1 into a first training set and a first testing set;
a3, creating and fitting a long-term and short-term memory neural network LSTM, and selecting an optimizer;
a4, training the long-term memory neural network LSTM by using a training set, and carrying out back propagation by using a gradient descent method to update parameters of the long-term memory neural network LSTM;
and A5, inputting the test set into the trained long-short-period memory neural network LSTM to obtain the predicted value of the long-short-period memory neural network LSTM.
Further, the step S3 is to obtain a predicted value of the autoregressive integrated moving average model ARIMA, which specifically includes the steps of:
b1, dividing the decomposed water quality index data into a second training set and a second testing set;
b2, carrying out stability test on data in the training set, and determining a differential coefficient d of an autoregressive integrated moving average model ARIMA;
b3, determining an autoregressive term number p and a moving average term number q in an autoregressive comprehensive moving average model ARIMA through an autocorrelation coefficient ACF and a partial autocorrelation coefficient PACF of the time sequence;
and B4, inputting the test set into an autoregressive integrated moving average model ARIMA of the determined parameters to obtain a predicted value of the autoregressive integrated moving average model ARIMA.
Further, the step S4 specifically includes the steps of:
s41, comparing predicted values of each modal component IMF in a long-term memory neural network LSTM and an autoregressive integrated moving average model ARIMA;
s42, respectively selecting an optimal precision model in a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for each modal component IMF according to the comparison result to obtain an optimal configuration model;
s43, inputting the decomposed water quality index data into an optimal configuration model, and outputting the predicted value of each modal component IMF.
The beneficial effects of the above-mentioned further scheme are: the autoregressive integrated moving average model ARIMA is one of the most widely used methods for predicting univariate time series data, and can reduce the calculation workload on the aspect of processing uncomplicated data, and can predict the data more accurately under the condition of obvious sequence trend. The long-term memory neural network LSTM is used as a special cyclic neural network RNN, so that the problem of long-term dependence in the cyclic neural network RNN is solved, and the optimal solution can be quickly converged to obtain on the prediction of time sequences. According to the invention, the optimal configuration model is obtained according to the long-term and short-term memory neural network LSTM and the autoregressive integrated moving average model ARIMA, the prediction effect is higher than that of any single model, and the advantages of the autoregressive integrated moving average model ARIMA and the long-term memory neural network LSTM are provided.
Drawings
FIG. 1 is a flow chart of the water quality prediction method of the present invention.
FIG. 2 shows the decomposition result of CEEMDAN based on the complete set empirical mode decomposition method.
FIG. 3 is a graph showing the comparison of the predicted result and the actual value of the ammonia nitrogen water quality index in the invention.
Detailed Description
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.
Examples
As shown in FIG. 1, the invention provides a water quality prediction method, which comprises the following steps:
s1, acquiring historical water quality index data of a monitoring station, and preprocessing the historical water quality index data to obtain an original water quality index sequence;
s2, decomposing an original water quality index sequence through a complete set empirical mode decomposition (CEEMDAN) method to obtain decomposed water quality index data;
s3, respectively inputting the decomposed water quality index data into a long-short-period memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for training, and respectively obtaining predicted values of the long-short-period memory neural network LSTM and the autoregressive comprehensive moving average model ARIMA;
s4, comparing predicted values of the long-term memory neural network LSTM and the autoregressive integrated moving average model ARIMA to obtain an optimal configuration model, predicting water quality indexes by using the optimal configuration model, and outputting predicted values of each modal component IMF;
s5, superposing the predicted value of each modal component IMF to obtain predicted final water quality index data.
The step S1 specifically comprises the following steps:
s11, screening out abnormal values by using a KNN algorithm according to the acquired historical water quality index data of the monitoring station, and deleting the abnormal values from the data set;
s12, establishing a linear relation based on the missing value in the data set and two actual historical water quality index data adjacent to the missing value left and right, calculating a data increment by using the slope of the assumed straight line to obtain data for filling the missing value, and supplementing the missing value;
s13, selecting a historical data set according to the historical water quality index data subjected to abnormal value and missing value processing and the time interval range, and obtaining an original water quality index sequence.
In this embodiment, relevant historical water quality index data of the VM1 monitoring station is selected, and ammonia nitrogen index is taken as an example, so as to obtain ammonia nitrogen data of each month in 29 years, and the data set is preprocessed.
The formula for establishing the linear relationship in the step S12 is as follows:
wherein, (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Two coordinates adjacent to the determined missing value are indicated, and (x, y) indicates the determined missing value.
The step S2 comprises the following specific steps:
s21, adding Gaussian white noise with the mean value of 0K times to an original water quality index sequence x (t) to obtain K' sequences to be decomposed;
s22, performing Empirical Mode Decomposition (EMD) on the K ' sequences to be decomposed to obtain first modal components of the K ' sequences to be decomposed respectively, and taking the average value of the first modal components of the K ' sequences to be decomposed to obtain first modal components IMF after CEEMDAN decomposition by the complete set empirical mode decomposition method 1 (t) and a first residual signal;
s23, after adding specific noise to the decomposed jth residual signal, judging whether the decomposed jth residual signal is a monotone signal, if so, obtaining decomposed water quality index data according to the jth modal component and the jth residual signal, otherwise, performing CEEMDAN decomposition by a next complete set empirical mode decomposition method.
The expression of the sequence to be decomposed in the step S21 is:
x i (t)=x(t)+εσ i (t)
wherein x is i (t) represents the ith sequence to be decomposed, ε represents the Gaussian white noise weight coefficient, σ i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, and x (t) represents an original water quality index sequence.
The first modal component IMF after decomposition in step S22 1 The expression of (t) and the first residual signal is:
r 1 (t)=x(t)-IMF 1 (t)
wherein, IMF 1 (t) represents the first modal component after decomposition, K' represents the number of sequences to be decomposed, IMF 1 i (t) represents the first modal component of the ith sequence to be decomposed, r 1 (t) represents the first residual signal after decomposition, and x (t) represents the original water quality index sequence.
The expression of the j-th modal component and the j-th residual signal after decomposition in the step S23 is:
r j (t)=r j-1 (t)-IMF j (t)
wherein, IMF j (t) represents the j-th modal component after decomposition, K' represents the number of sequences to be decomposed, E 1 () Representing the first modal component, σ, of the sequence after EMD decomposition i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, E j-1i (t)) represents the sum of the values of sigma i (t) the j-1 th modal component after EMD decomposition, σ i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, r j (t) represents the j-th residual signal, r j-1 (t) represents the j-1 th residual signal, ε j-1 The weight coefficient indicating the addition of noise to the j-1 th residual signal.
In the embodiment, the original water quality index data is decomposed by adopting a complete set empirical mode decomposition method CEEMDAN, and through multiple tests, the original water quality index data with larger fluctuation has more ideal decomposition effect when the variance of added noise is about 0.5 and the noise number is about 100. Thus, in this embodiment, the complete set empirical mode decomposition method CEEMNAN decomposition with a noise variance of 0.5 and a noise number of 120 is used to decompose all the original water quality index sequences. Taking the water quality data of the water quality index ammonia nitrogen as an example, after the original water quality index sequence of the ammonia nitrogen is decomposed by a CEEMDAN (complete set empirical mode decomposition) method, seven modal components IMF and a residual signal are obtained.
As shown in fig. 2, from the decomposition result of the CEEMDAN of the complete set empirical mode decomposition method, the first three modal components IMF have larger fluctuation, and the fluctuation and non-stationarity of the last four modal components IMF are greatly reduced. The residual signal represents the error of the original water quality index sequence of ammonia nitrogen after being decomposed by the CEEMDAN through a complete set empirical mode decomposition method, and is mainly used for detecting the decomposition performance, and the residual signal decomposed by the embodiment is basically zero, so that the decomposition effect is good.
The predicted value of the long-short-term memory neural network LSTM is obtained in the step S3, and the specific steps are as follows:
a1, performing data preprocessing and normalization operation according to the decomposed water quality index data;
a2, dividing the water quality index data processed in the step A1 into a first training set and a first testing set;
a3, creating and fitting a long-term and short-term memory neural network LSTM, and selecting an optimizer;
a4, training the long-term memory neural network LSTM by using a training set, and carrying out back propagation by using a gradient descent method to update parameters of the long-term memory neural network LSTM;
and A5, inputting the test set into the trained long-short-period memory neural network LSTM to obtain the predicted value of the long-short-period memory neural network LSTM.
The step S3 is to obtain the predicted value of the autoregressive integrated moving average (ARIMA), which comprises the following specific steps:
b1, dividing the decomposed water quality index data into a second training set and a second testing set;
b2, carrying out stability test on data in the training set, and determining a differential coefficient d of an autoregressive integrated moving average model ARIMA;
b3, determining an autoregressive term number p and a moving average term number q in an autoregressive comprehensive moving average model ARIMA through an autocorrelation coefficient ACF and a partial autocorrelation coefficient PACF of the time sequence;
and B4, inputting the test set into an autoregressive integrated moving average model ARIMA of the determined parameters to obtain a predicted value of the autoregressive integrated moving average model ARIMA.
In this embodiment, each modal component IMF data obtained by CEEMDAN decomposition by the full set empirical mode decomposition method is divided. The first 80% of the data was taken as training data for both models, while the last 20% was taken. The latter 20% of data does not participate in the training process of the two prediction models, but can be used as test data to test the accuracy of the two prediction models respectively.
And respectively inputting the test data into a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA to obtain predicted values of the modal components IMF. Calculating accuracy of predicting each modal component IMF by using the long-short-term memory neural network LSTM and the autoregressive integrated moving average model ARIMA, preferentially selecting a prediction model of each modal component IMF, predicting each modal component IMF by using the prediction model preferentially selected by each modal component IMF, and overlapping prediction results to obtain final water quality index prediction.
Taking ammonia nitrogen as an example, seven modal components IMFs are obtained after CEEMDAN decomposition by a complete set empirical mode decomposition method, and then the first 80% of the data of the seven modal components IMFs are used as a training sample set to serve as a first training set and a second training set. The first training set and the second training set are respectively input into a long-short-period memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for training, and then the rest 20% of data are used as a first test set and a second test set for respectively testing the prediction effects of the two models of the long-short-period memory neural network LSTM and the autoregressive comprehensive moving average model ARIMA.
The step S4 specifically comprises the following steps:
s41, comparing predicted values of each modal component IMF in a long-term memory neural network LSTM and an autoregressive integrated moving average model ARIMA;
s42, respectively selecting an optimal precision model in a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for each modal component IMF according to the comparison result to obtain an optimal configuration model;
s43, inputting the decomposed water quality index data into an optimal configuration model, and outputting the predicted value of each modal component IMF.
In this embodiment, the prediction data of the autoregressive integrated moving average model ARIMA and the long-short term memory neural network LSTM are compared with the original data, and the root mean square error RMSE, the mean absolute error MAE and the mean absolute percentage error MAPE thereof are calculated respectively. By comparison, an optimal predictive model is selected for each modal component IMF. Table 1 shows prediction errors of seven modal components IMF for ammonia nitrogen.
TABLE 1
The comparison in table 1 can obtain, for seven modal components IMFs of the original water quality index data of the water quality index ammonia nitrogen in this embodiment after the complete set empirical mode decomposition method CEEMDAN is decomposed, the modal components IMF1, IMF2 and IMF3 can be predicted by the long-short-term memory neural network LSTM, and the modal components IMF4, IMF5, IMF6 and IMF7 can be predicted by the autoregressive integrated moving average model ARIMA to obtain an optimal configuration model.
And finally, predicting the seven modal components IMF through an optimal configuration model, an autoregressive comprehensive moving average model ARIMA and a long and short term memory neural network LSTM respectively to obtain final water quality index data of ammonia nitrogen prediction, wherein RMSE represents root mean square error and MAE represents average absolute error. Table 2 shows the prediction error comparison results.
TABLE 2
In the prediction model, the prediction is performed by only using a long-short-term memory neural network LSTM or only using a full set empirical mode decomposition method ARIMA, the prediction effect is far less than that of an optimal configuration model used by combining the long-term memory neural network LSTM and the full set empirical mode decomposition method ARIMA, the prediction is performed on the basis of CEEMDAN decomposition data based on the full set empirical mode decomposition method, and the Root Mean Square Error (RMSE) of the LSTM-ARIMA model prediction model is at least 7% lower than that of a single model.
As shown in FIG. 3, the comparison of the predicted result and the true value of the ammonia nitrogen water quality index shows that the method has high precision of the predicted result and can solve the problems of time sequence interruption in the water quality index data and insufficient accuracy of a water quality prediction model caused by the prolongation of the time sequence.

Claims (10)

1. The water quality prediction method is characterized by comprising the following steps of:
s1, acquiring historical water quality index data of a monitoring station, and preprocessing the historical water quality index data to obtain an original water quality index sequence;
s2, decomposing an original water quality index sequence through a complete set empirical mode decomposition (CEEMDAN) method to obtain decomposed water quality index data;
s3, respectively inputting the decomposed water quality index data into a long-short-period memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for training, and respectively obtaining predicted values of the long-short-period memory neural network LSTM and the autoregressive comprehensive moving average model ARIMA;
s4, comparing predicted values of the long-term memory neural network LSTM and the autoregressive integrated moving average model ARIMA to obtain an optimal configuration model, predicting water quality indexes by using the optimal configuration model, and outputting predicted values of each modal component IMF;
s5, superposing the predicted value of each modal component IMF to obtain predicted final water quality index data.
2. The water quality prediction method according to claim 1, wherein the step S1 specifically comprises the steps of:
s11, screening out abnormal values by using a KNN algorithm according to the acquired historical water quality index data of the monitoring station, and deleting the abnormal values from the data set;
s12, establishing a linear relation based on the missing value in the data set and two actual historical water quality index data adjacent to the missing value left and right, calculating a data increment by using the slope of the assumed straight line to obtain data for filling the missing value, and supplementing the missing value;
s13, selecting a historical data set according to the historical water quality index data subjected to abnormal value and missing value processing and the time interval range, and obtaining an original water quality index sequence.
3. The water quality prediction method according to claim 2, wherein the formula for establishing the linear relationship in step S12 is:
wherein, (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Two coordinates adjacent to the determined missing value are indicated, and (x, y) indicates the determined missing value.
4. The water quality prediction method according to claim 1, wherein the step S2 specifically comprises the steps of:
s21, adding Gaussian white noise with the mean value of 0K times to an original water quality index sequence x (t) to obtain K' sequences to be decomposed;
s22, performing Empirical Mode Decomposition (EMD) on the K ' sequences to be decomposed to obtain first modal components of the K ' sequences to be decomposed respectively, and taking the average value of the first modal components of the K ' sequences to be decomposed to obtain first modal components IMF after CEEMDAN decomposition by the complete set empirical mode decomposition method 1 (t) and a first residual signal;
s23, after adding specific noise to the decomposed jth residual signal, judging whether the decomposed jth residual signal is a monotone signal, if so, obtaining decomposed water quality index data according to the jth modal component and the jth residual signal, otherwise, performing CEEMDAN decomposition by a next complete set empirical mode decomposition method.
5. The method according to claim 4, wherein the expression of the sequence to be decomposed in the step S21 is:
x i (t)=x(t)+εσ i (t)
wherein x is i (t) represents the ith sequence to be decomposed, ε represents the Gaussian white noise weight coefficient, σ i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, and x (t) represents an original water quality index sequence.
6. The method according to claim 4, wherein the first modal component IMF after decomposition in step S22 1 The expression of (t) and the first residual signal is:
r 1 (t)=x(t)-IMF 1 (t)
wherein, IMF 1 (t) represents the first modal component after decomposition, K' represents the number of sequences to be decomposed, IMF 1 i (t) represents the first modal component of the ith sequence to be decomposed, r 1 (t) represents the first residual signal after decomposition, and x (t) represents the original water quality index sequence.
7. The method according to claim 4, wherein the expression of the j-th modal component and the j-th residual signal after decomposition in the step S23 is:
r j (t)=r j-1 (t)-IMF j (t)
wherein, IMF j (t) represents the j-th modal component after decomposition, K' represents the number of sequences to be decomposed, E 1 () Representing the first modal component, σ, of the sequence after EMD decomposition i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, E j-1i (t)) represents the sum of the values of sigma i (t) the j-1 th modal component after EMD decomposition, σ i (t) represents Gaussian white noise generated by the ith sequence to be decomposed, r j (t) represents the j-th residual signal, r j-1 (t) represents the j-1 th residual signal, ε j-1 The weight coefficient indicating the addition of noise to the j-1 th residual signal.
8. The water quality prediction method according to claim 1, wherein the step S3 is performed to obtain a predicted value of the long-short-term memory neural network LSTM, and the specific steps are as follows:
a1, performing data preprocessing and normalization operation according to the decomposed water quality index data;
a2, dividing the water quality index data processed in the step A1 into a first training set and a first testing set;
a3, creating and fitting a long-term and short-term memory neural network LSTM, and selecting an optimizer;
a4, training the long-term memory neural network LSTM by using a training set, and carrying out back propagation by using a gradient descent method to update parameters of the long-term memory neural network LSTM;
and A5, inputting the test set into the trained long-short-period memory neural network LSTM to obtain the predicted value of the long-short-period memory neural network LSTM.
9. The method for predicting water quality according to claim 1, wherein the step S3 is to obtain a predicted value of an autoregressive integrated moving average model ARIMA, and comprises the following specific steps:
b1, dividing the decomposed water quality index data into a second training set and a second testing set;
b2, carrying out stability test on data in the training set, and determining a differential coefficient d of an autoregressive integrated moving average model ARIMA;
b3, determining an autoregressive term number p and a moving average term number q in an autoregressive comprehensive moving average model ARIMA through an autocorrelation coefficient ACF and a partial autocorrelation coefficient PACF of the time sequence;
and B4, inputting the test set into an autoregressive integrated moving average model ARIMA of the determined parameters to obtain a predicted value of the autoregressive integrated moving average model ARIMA.
10. The water quality prediction method according to claim 1, wherein the step S4 specifically comprises the steps of:
s41, comparing predicted values of each modal component IMF in a long-term memory neural network LSTM and an autoregressive integrated moving average model ARIMA;
s42, respectively selecting an optimal precision model in a long-short-term memory neural network LSTM and an autoregressive comprehensive moving average model ARIMA for each modal component IMF according to the comparison result to obtain an optimal configuration model;
s43, inputting the decomposed water quality index data into an optimal configuration model, and outputting the predicted value of each modal component IMF.
CN202310763846.XA 2023-06-26 2023-06-26 Water quality prediction method Pending CN117035155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310763846.XA CN117035155A (en) 2023-06-26 2023-06-26 Water quality prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310763846.XA CN117035155A (en) 2023-06-26 2023-06-26 Water quality prediction method

Publications (1)

Publication Number Publication Date
CN117035155A true CN117035155A (en) 2023-11-10

Family

ID=88621502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310763846.XA Pending CN117035155A (en) 2023-06-26 2023-06-26 Water quality prediction method

Country Status (1)

Country Link
CN (1) CN117035155A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117670147A (en) * 2024-02-01 2024-03-08 江西省科学院微生物研究所(江西省流域生态研究所) Lake water quality prediction method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117670147A (en) * 2024-02-01 2024-03-08 江西省科学院微生物研究所(江西省流域生态研究所) Lake water quality prediction method and system
CN117670147B (en) * 2024-02-01 2024-04-19 江西省科学院微生物研究所(江西省流域生态研究所) Lake water quality prediction method and system

Similar Documents

Publication Publication Date Title
CN111160651B (en) STL-LSTM-based subway passenger flow prediction method
CN107895100B (en) Drainage basin water quality comprehensive evaluation method and system
CN113065703A (en) Time series prediction method combining multiple models
CN117035155A (en) Water quality prediction method
CN115587666A (en) Load prediction method and system based on seasonal trend decomposition and hybrid neural network
CN113298288A (en) Power supply station operation and maintenance cost prediction method integrating time sequence and neural network
CN103454390A (en) Method and device for measuring concentration of dissolved oxygen
CN110633859A (en) Hydrological sequence prediction method for two-stage decomposition integration
CN112434890A (en) Prediction method of tunnel settlement time sequence based on CEEMDAN-BilSTM
CN115169702A (en) EEMD-LSTNet-based water quality parameter prediction method and system
CN113554213A (en) Natural gas demand prediction method, system, storage medium and equipment
CN112966435B (en) Bridge deformation real-time prediction method
CN114548161A (en) Dense medium separation clean coal ash content prediction method and device, electronic equipment and medium
CN113887119A (en) River water quality prediction method based on SARIMA-LSTM
CN117543544A (en) Load prediction method, device, equipment and storage medium
CN112257958A (en) Power saturation load prediction method and device
Pawlak et al. Nonparametric sequential signal change detection under dependent noise
CN110648023A (en) Method for establishing data prediction model based on quadratic exponential smoothing improved GM (1,1)
WO2022222230A1 (en) Indicator prediction method and apparatus based on machine learning, and device and storage medium
CN116127833A (en) Wind power prediction method, system, device and medium based on VMD and LSTM fusion model
CN115687322A (en) Water quality time series missing data completion method based on encoder-decoder and autoregressive generated countermeasure network
CN114970813A (en) Dissolved oxygen concentration data restoration and prediction method
CN112700050B (en) Method and system for predicting ultra-short-term 1 st point power of photovoltaic power station
CN113762795A (en) Industrial chain diagnosis method and system based on hierarchical analysis
CN113435653B (en) Method and system for predicting saturated power consumption based on logistic model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination