CN103942457A - Water quality parameter time series prediction method based on relevance vector machine regression - Google Patents

Water quality parameter time series prediction method based on relevance vector machine regression Download PDF

Info

Publication number
CN103942457A
CN103942457A CN201410196457.4A CN201410196457A CN103942457A CN 103942457 A CN103942457 A CN 103942457A CN 201410196457 A CN201410196457 A CN 201410196457A CN 103942457 A CN103942457 A CN 103942457A
Authority
CN
China
Prior art keywords
water quality
quality parameter
time series
prediction
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410196457.4A
Other languages
Chinese (zh)
Other versions
CN103942457B (en
Inventor
汪晓东
笪英云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Normal University CJNU
Original Assignee
Zhejiang Normal University CJNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Normal University CJNU filed Critical Zhejiang Normal University CJNU
Priority to CN201410196457.4A priority Critical patent/CN103942457B/en
Publication of CN103942457A publication Critical patent/CN103942457A/en
Application granted granted Critical
Publication of CN103942457B publication Critical patent/CN103942457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a water quality parameter time series prediction method based on relevance vector machine regression. The water quality parameter time series prediction method comprises the following steps of 1 acquiring water quality parameter historical data from an automatic water quality monitoring station and performing data pre-processing; 2 using front 2/3 data in the pro-processed water quality parameter historical data as a training sample set and using rear 1/3 data as a testing sample set; 3 using the training sample set to train an RVM, using the testing sample set to test the trained RVM so as to obtain a water quality parameter time series prediction model based on the RVM regression; 4 using the water quality parameter time series prediction model based on the RVM regression to predict new water quality parameters. The water quality parameter time series prediction method can perform time series prediction, is large in prediction range, high in accuracy and good in prediction stability, and can provide probabilistic output, give a predicted confidence interval while performing prediction, reduce the prediction time and timely observe water quality parameter change.

Description

The water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine
Technical field
The present invention relates to water quality monitoring field, be specifically related to the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine.
Background technology
Water quality parameter time series is an orderly Monitoring Data sequence, and it has embodied certain water quality parameter distribution situation in time, if certain basin section was certain year the 1st Monitoring Data of water quality parameter pH value of thoughtful the 50th week.Water quality parameter Time Series Forecasting Methods is to utilize acquired historical time arrangement set, analyze inherent the statistical properties and the rule of development of the historical data in set, set up water quality parameter time series predicting model, and utilize this model to obtain predicted data to show the development trend of Future Data.Water quality parameter time series forecasting is water environment management and pollute the element task of controlling.At present China's water pollution accident, owing to lacking information and the technical support in early stage, is to add up mostly afterwards, the variation of unpredictable water quality and avoid the generation of contamination accident.Therefore, set up one of study hotspot that reliable water quality parameter time series predicting model is water environment scientific domain in recent years.Common water quality parameter time series predicting model is mainly artificial neural network and support vector machine (Support Vector Machine both at home and abroad at present, SVM) regression time sequential forecasting models, but artificial neural network algorithm was prone to study or owed study, local minimum, network structure is difficult to determine, the problems such as Generalization Ability is poor, and SVM regression model is a kind of supervision formula learning method being based upon on Statistical Learning Theory and structural risk minimization basis, the method is mapped to original input by kernel function the high-dimensional feature space of linear separability, there is generalization ability strong, be difficult for occurring the advantages such as over-fitting, can solve preferably small sample, non-linear, the problems such as high dimension drawn game portion minimal point, compared with artificial neural network time series predicting model, time series predicting model performance based on SVM increases, but in SVM time series predicting model, kernel function must meet Mercer condition, the number of support vector can increase along with the increase of training sample is linear, and only provide deterministic predicting the outcome, there is no probability output, be unable to estimate the uncertainty of prediction, and the prediction of probabilistic type can provide important information in actual applications, contribute to determine the confidence level of water quality parameter prediction.Interconnection vector machine (Relevance Vector Machine, RVM) a kind of newer machine learning algorithm that to be Tipping propose on the basis of calendar year 2001 at Bayesian frame, its kernel function needn't meet Mercer condition, the sparse property of separating is also far above SVM, and can provide the probabilistic information of prediction, have good generalization ability, RVM has obtained application in solving pattern-recognition and returning many practical problemss such as estimation, and has obtained good effect.The Chinese patent that application number is 20131013190.7 provides a kind of sewage quality monitoring method and device, the forecast model that the method adopts is the flexible measurement method based on interconnection vector machine, compare the model that adopts neural network and model construction of SVM method to set up, there is the precision of prediction of applicability and Geng Gao better, but the method has following defect: one: be the content that water outlet total nitrogen or water outlet total phosphorus were analyzed and then obtained to correlation parameter due to what adopt, the uncertainty of related data can greatly affect the data result of its final output, although data output is compared the model of setting up with neural network and model construction of SVM method and is greatly improved, but the instability of the data result of its final output still exists, two: just merely analyze the content of water outlet total nitrogen at that time or water outlet total phosphorus, cannot realize time series forecasting, the scope of prediction is little, and precision is low.
Summary of the invention
Technical matters to be solved by this invention is to provide the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine, can carry out time series forecasting, the scope of prediction is large, precision is high, the good stability of prediction, and can provide probability output, in providing prediction, provide the fiducial interval of prediction, reduce predicted time, observe in time the variation of water quality parameter.
For solving above-mentioned existing technical matters, the present invention adopts following scheme: the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine, comprises the following steps:
Step 1: gather water quality parameter historical data and data are carried out to pre-service from Water Automatic Monitoring System, by the missing data completion in historical data, first missing data being done to mend 0 processes, then historical data is done to the pre-service in time domain according to time series, carry out again frequency filtering, finally utilize least square method to carry out best-fit comparison, in the curve obtaining in final matching, find out corresponding observation point and be 0 match value, be the completion value of actual missing data, thereby substitution completion value is by the missing data completion in historical data;
Step 2: using the data through front 2/3 in pretreated water quality parameter historical data as training sample set, rear 1/3 data are as test sample book collection;
Step 3: using the water quality parameter values of some continuous unit interval before training sample set as input, using the water quality parameter value of next unit interval as output, RVM is trained; RVM after training with test sample book set pair tests, the water quality parameter value of some continuous unit interval before test sample book collection is sent into the input end of the RVM after training, and observe the predicted value of the output terminal of this RVM, error between predicted value and next unit interval water quality parameter value of test sample book collection of output terminal meets the requirements of in situation, upcheck, obtain the water quality parameter time series predicting model returning based on RVM;
Step 4: use the time series predicting model returning based on RVM to predict new water quality parameter, send into the input end of forecast model by the water quality parameter value of new front some unit interval, dope the water quality parameter value of next unit interval at its output terminal.
As preferably, described water quality parameter adopts pH value, dissolved oxygen content, permanganate index or ammonia-nitrogen content.
As preferably, the water quality parameter time series predicting model returning based on RVM in described step 3 is as follows: for the x that is input as of given water quality parameter pH value, dissolved oxygen content, permanganate index or ammonia-nitrogen content *, the prediction average y of corresponding output *and variance be respectively y * = &mu; T &phi; ( x * ) &sigma; * 2 = &sigma; MP 2 + &phi; ( x * ) T &Sigma;&phi; ( x * ) , Prediction output t *obedience average is y *, variance is gaussian distribute, wherein μ Τrepresent posteriority weight average value, be noise variance, prediction is output as t * = &mu; T &phi; ( x * ) + N ( 0 , &sigma; MP 2 + &phi; ( x * ) T &Sigma;&phi; ( x * ) ) ; It is 1-θ, t that reliability is set *two-sided confidence interval can be obtained by following formula: be p{y **z θ/2< t *< y *+ σ *z θ/2}=1-θ obtains t *degree of confidence be that the fiducial interval of 1-θ is [y* -σ *z θ/2, y *+ σ *z θ/2] upper fractile z θ/2check in by standardized normal distribution table, ask 95% fiducial interval.
As preferably, in described step 3, in the time of verify error, adopt square error MSE (Mean Square Error, MSE), coefficient R (Correlation Coefficient) is as the index of valuation prediction models performance, its computing formula is respectively:
MSE = 1 n &Sigma; i = 1 n ( y ai - y pi ) 2
R = &Sigma; i = 1 n ( y ai - y &OverBar; a ) ( y pi - y &OverBar; p ) &Sigma; i = 1 n ( y ai y &OverBar; a ) 2 &CenterDot; ( y pi - y &OverBar; p ) 2
The estimated performance of the less expression model of square error is better, and the absolute value of related coefficient is more more accurate close to 1 explanation prediction, wherein y aiand y pirepresent respectively actual value and the predicted value of i sample of water quality parameter, with represent respectively n actual value average and the predicted value average of corresponding water quality parameter.
As preferably, the requirement difference reaching for different water quality parameter errors in described step 3: the square error of the model to pH value prediction is lower than 0.004, the square error of the model to dissolved oxygen content prediction is lower than 0.08, the square error of the model to permanganate index prediction is lower than 0.02, the square error of the model to ammonia-nitrogen content prediction is lower than 0.002, and the related coefficient of the model of above-mentioned prediction is all not less than 0.95.
Beneficial effect:
The present invention adopts technique scheme that the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine is provided, can carry out time series forecasting, the scope of prediction is large, precision is high, the good stability of prediction, and can provide probability output, in providing prediction, can also provide the fiducial interval of prediction, reduce predicted time, observe in time the variation of water quality parameter.
Brief description of the drawings
Fig. 1 is schematic flow sheet of the present invention;
Fig. 2 is that forecast model of the present invention adopts the time series forecasting result of linear kernel function to pH value;
Fig. 3 is that forecast model of the present invention adopts the time series forecasting result of linear kernel function to dissolved oxygen content;
Fig. 4 is that forecast model of the present invention adopts the time series forecasting result of linear kernel function to permanganate index;
Fig. 5 is that forecast model of the present invention adopts the time series forecasting result of linear kernel function to ammonia-nitrogen content;
Fig. 6 is that forecast model of the present invention adopts the time series forecasting result of gaussian kernel function to pH value;
Fig. 7 is that forecast model of the present invention adopts the time series forecasting result of gaussian kernel function to dissolved oxygen content;
Fig. 8 is that forecast model of the present invention adopts the time series forecasting result of gaussian kernel function to permanganate index;
Fig. 9 is that forecast model of the present invention adopts the time series forecasting result of gaussian kernel function to ammonia-nitrogen content;
To be pH value adopt the forecast model of linear kernel function or gaussian kernel function, support vector machine to adopt the relative error curve map linear kernel function or gaussian kernel function at forecast model of the present invention to Figure 10;
To be dissolved oxygen content adopt the forecast model of linear kernel function or gaussian kernel function, support vector machine to adopt the relative error curve map linear kernel function or gaussian kernel function at forecast model of the present invention to Figure 11;
To be permanganate index adopt the forecast model of linear kernel function or gaussian kernel function, support vector machine to adopt the relative error curve map linear kernel function or gaussian kernel function at forecast model of the present invention to Figure 12;
To be ammonia-nitrogen content adopt the forecast model of linear kernel function or gaussian kernel function, support vector machine to adopt the relative error curve map linear kernel function or gaussian kernel function at forecast model of the present invention to Figure 13.
Embodiment
As shown in Figure 1, the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine, comprises the following steps:
Step 1: gather water quality parameter historical data and data are carried out to pre-service from Water Automatic Monitoring System, by the missing data completion in historical data, first missing data being done to mend 0 processes, then historical data is done to the pre-service in time domain according to time series, carry out again frequency filtering, finally utilize least square method to carry out best-fit comparison, in the curve obtaining in final matching, find out corresponding observation point and be 0 match value, be the completion value of actual missing data, thereby substitution completion value is by the missing data completion in historical data;
Step 2: using the data through front 2/3 in pretreated water quality parameter historical data as training sample set, rear 1/3 data are as test sample book collection;
Step 3: using the water quality parameter values of some continuous unit interval before training sample set as input, using the water quality parameter value of next unit interval as output, RVM is trained; RVM after training with test sample book set pair tests, the water quality parameter value of some continuous unit interval before test sample book collection is sent into the input end of the RVM after training, and observe the predicted value of the output terminal of this RVM, error between predicted value and next unit interval water quality parameter value of test sample book collection of output terminal meets the requirements of in situation, upcheck, obtain the water quality parameter time series predicting model returning based on RVM;
Step 4: use the time series predicting model returning based on RVM to predict new water quality parameter, send into the input end of forecast model by the water quality parameter value of new front some unit interval, dope the water quality parameter value of next unit interval at its output terminal.
Described water quality parameter adopts pH value, dissolved oxygen content, permanganate index or ammonia-nitrogen content.The water quality parameter time series predicting model returning based on RVM in described step 3 is as follows: for the x that is input as of given water quality parameter pH value, dissolved oxygen content, permanganate index or ammonia-nitrogen content *, the prediction average y of corresponding output *and variance be respectively y * = &mu; T &phi; ( x * ) &sigma; * 2 = &sigma; MP 2 + &phi; ( x * ) T &Sigma;&phi; ( x * ) , Prediction output t *obedience average is y *, variance is gaussian distribute, wherein μ Τrepresent posteriority weight average value, be noise variance, prediction is output as t * = &mu; T &phi; ( x * ) + N ( 0 , &sigma; MP 2 + &phi; ( x * ) T &Sigma;&phi; ( x * ) ) ; It is 1-θ, t that reliability is set *two-sided confidence interval can be obtained by following formula: be p{y **z θ/2< t *< y *+ σ *z θ/2}=1-θ, obtains t *degree of confidence be that the fiducial interval of 1-θ is [y **z θ/2, y *+ σ *z θ/2], upper fractile z θ/2check in by standardized normal distribution table, ask 95% fiducial interval.In described step 3, in the time of verify error, adopt square error MSE (Mean Square Error, MSE), coefficient R (Correlation Coefficient) is as the index of valuation prediction models performance, its computing formula is respectively:
MSE = 1 n &Sigma; i = 1 n ( y ai - y pi ) 2
R = &Sigma; i = 1 n ( y ai - y &OverBar; a ) ( y pi - y &OverBar; p ) &Sigma; i = 1 n ( y ai y &OverBar; a ) 2 &CenterDot; ( y pi - y &OverBar; p ) 2
The estimated performance of the less expression model of square error is better, and the absolute value of related coefficient is more more accurate close to 1 explanation prediction, wherein y aiand y pirepresent respectively actual value and the predicted value of i sample of water quality parameter, with represent respectively n actual value average and the predicted value average of corresponding water quality parameter.The requirement difference reaching for different water quality parameter errors in described step 3: the square error of the model to pH value prediction is lower than 0.004, the square error of the model to dissolved oxygen content prediction is lower than 0.08, the square error of the model to permanganate index prediction is lower than 0.02, the square error of the model to ammonia-nitrogen content prediction is lower than 0.002, and the related coefficient of the model of above-mentioned prediction is all not less than 0.95.
The time series predicting model of water quality parameter is expressed as follows:
If time series is wherein N is sequence length, y nfor the water quality parameter monitor value in n moment, x n=[y n-d τ, y n-(d-1) ..., y n-τ] be the vector of d monitor value composition before, d is for embedding dimension here, and τ is time delay, has certain mapping relations:
y n=F(x n),n=1,2,…,N
The key that realizes water quality parameter prediction is the accurate simulation to F (), builds training sample set for this reason wherein x n=[y n-d τ, y n-(d-1) τ..., y n-τ] tfor input sample, t n=y nfor output sample, utilize this training sample set pair interconnection vector machine to train, set up water quality parameter time series predicting model, wherein adopt d=4, τ is that the recurrence of 1 week postpones, and predicts next weekly data with front 4 weekly datas.
The derivation step of the water quality parameter time series predicting model returning based on RVM is as follows:
Step 1: the training sample set of given water quality parameter pH value, dissolved oxygen content, permanganate index or ammonia-nitrogen content 4 dimension input vectors, t nbe output, suppose independent distribution both, and relation between them can be expressed as t n=y (x n; W)+ε n, wherein ε nindependent identically distributed Gaussian noise, and ε n~N (0, σ 2), i.e. t nobeying average is y (x n, w), variance is σ 2gaussian distribution;
Step 2: the output of forecast model can be expressed as k (x, x i) be kernel function, kernel function adopts respectively linear kernel function or gaussian kernel function, w=[ω 0, ω 1..., ω n] tfor the weight vector of model, by ε nmeet Gaussian and distribute, target output value t nseparate, the likelihood function of whole training sample set is p ( t | w , &sigma; 2 ) = ( 2 &pi; &sigma; 2 ) - N / 2 exp { - 1 2 &sigma; 2 | | t - &Phi;w | | 2 } , T=[t in formula 1, t 2..., t n] t, Φ=[φ (x 1), φ (x 2) ..., φ (x n)] tfor the matrix of N × (N+1), φ (x n)=[1, K (x n, x 1), K (x n, x 2) ..., K (x n, x n)] t;
Step 3: for making model there is generalization, use Bayesian framework, introduce prior probability distribution: the super parameter vector that in formula, α is made up of the super parameter of N+1, the posterior probability of training sample set distributes and can be tried to achieve by the reasoning of Bayesian formula: p ( &omega; , &alpha; , &sigma; 2 | t ) = p ( t | &omega; , &alpha; , &sigma; 2 ) p ( &omega; , &alpha; , &sigma; 2 ) p ( t ) , The posterior probability of weight vectors ω is distributed as p ( &omega; | t , &alpha; , &sigma; 2 ) = p ( t | &omega; , &sigma; 2 ) p ( &omega; | &alpha; ) p ( t | &alpha; , &sigma; 2 ) = ( 2 &pi; ) - ( N + 1 ) / 2 | &Sigma; | - 1 / 2 exp { - 1 2 ( &omega; - &mu; ) T &Sigma; - 1 ( &omega; - &mu; ) } , Posterior variance and average are respectively
&Sigma; = ( &sigma; - 2 &Phi; T &Phi; + A ) - 1 , &mu; = &sigma; - 2 &Sigma; &Phi; T ;
Step 4: obtain p (t| α, σ by training sample set being carried out to edge integration 2)=∫ p (t|w, σ 2) p (w| α) dW, thereby obtain the marginal likelihood function of super parameter: p (t| α, σ 2)=N (0, C), wherein C=σ 2i+ Φ A -1Φ t, super parameter alpha and σ 2the posterior probability that directly affects ω distributes, and the maximum a posteriori probability distribution that need to optimize to obtain ω to it, introduces delta function, is translated into super parameter posterior probability distribution p (α, σ 2| t) about α and σ 2max problem, the in the situation that of consistent super prior probability distribution, only need maximization marginal likelihood function;
Step 5: arrange according to MacKay method: wherein μ ibe i the element of mean vector μ, in MacKay method, define γ i=1-α iΣ ii, i element on the diagonal line of variance Σ, upgrades by continuous iteration 2) new, be that the gradient of Output rusults is less than 10 until all parameters all restrain -3or till while reaching maximum frequency of training 1000, obtain super parameter alpha by maximum likelihood method mPand noise variance
Step 6: if input given water quality parameter value x *, the probability distribution of corresponding output is: p ( t * | t , &alpha; MP , &sigma; MP 2 ) = &Integral; p ( t * | &omega; , &sigma; MP 2 ) p ( &omega; | t , &alpha; MP , &sigma; MP 2 ) d&omega; , Obey Gaussian and distribute, p ( t * | t , &alpha; MP , &sigma; MP 2 ) = N ( t * | y * , &sigma; * 2 ) , Wherein, prediction average and variance are respectively y * = &mu; T &phi; ( x * ) &sigma; * 2 = &sigma; MP 2 + &phi; ( x * ) T &Sigma;&phi; ( x * ) , T *obedience average is y *, variance is gaussian distribute, t * - y * &sigma; * ~ N ( 0,1 ) ;
Step 7: it is 1-θ, t that reliability is set *two-sided confidence interval can be obtained by following formula: be p{y **z θ/2< t *< y *+ σ *z θ/2}=1-θ, obtains t *degree of confidence be that the fiducial interval of 1-θ is [y **z θ/2, y *+ σ *z θ/2], upper fractile Z θ/2can check in by standardized normal distribution table, ask 95% fiducial interval.
According to the derivation of the time series predicting model returning based on RVM, determine that the concrete operation step of the time series predicting model returning based on RVM is as follows:
(1) determine the training sample set of water quality parameter pH value, dissolved oxygen content, permanganate index or ammonia-nitrogen content;
(2) select kernel function, and definite kernel function width gamma and noise variance σ 2;
(3) initialization α and σ 2;
(4) posterior variance Σ and the average μ of calculating weight vectors ω;
(5) upgrade 2) new;
(6) circulation step (4) and step (5), until the gradient of maximum iteration time 1000 or Output rusults is less than 10 -3;
(7) delete in super parameter alpha and be more than or equal to α max(get e 9) corresponding weight coefficient and basis function, the rarefaction of implementation model;
(8) test set to water quality parameter, the super parameter alpha being obtained by training mPand noise variance predict estimation.
Carry out labor below by two groups of experimental results; when the present invention carries out experiment test, the national main river emphasis section Sichuan dragon's cave-stalactite cave automatic water quality monitoring weekly (2004 the 1st thoughtful 2012 year the 53rd week) of announcing data from People's Republic of China's Environmental Protection Department (http://www.mep.gov.cn/).Select 2004 the 1st thoughtful 2009 year the 52nd week of Sichuan's dragon's cave-stalactite cave as training dataset, 2010 the 1st thoughtful 2012 year the 53rd week as test data set.Predict next weekly data with front 4 weekly datas, training dataset is 308 groups, and test data set is 153 groups.
Experiment 1: in view of choosing of kernel function has impact to a certain degree to modeling effect, in the present invention, when based on RVM regression modeling, linear kernel function and gaussian kernel function are chosen respectively, so that result is selected most suitable kernel function to each water quality parameter by experiment.
Fig. 2~9 have provided the modeling algorithm returning based on RVM and have chosen respectively linear kernel function and the gaussian kernel function time series forecasting result figure to each water quality parameter, have provided qualitatively the result while adopting different IPs function.
From Fig. 2~5, can find out, the time series predicting model that linear kernel function RVM returns is better to the prediction effect of pH value, although slightly weaker to the prediction effect of dissolved oxygen DO, permanganate index and ammonia nitrogen, but still can accept.Can be found out by Fig. 6~9, although predicted value and the original value all energy quite well of the time series predicting model that gaussian kernel function RVM returns to four kinds of water quality parameters, is obviously better than the prediction effect to pH value and ammonia nitrogen to the prediction effect of dissolved oxygen DO and permanganate index.
According to above-mentioned experiment, the time series predicting model that contrasts respectively gaussian kernel function and linear kernel function RVM recurrence is known to predicting the outcome of four kinds of water quality parameters, gaussian kernel function RVM regression time sequential forecasting models does not have linear kernel function RVM regression time sequential forecasting models good to the prediction effect of pH value, but linear kernel function RVM regression time sequential forecasting models does not have gaussian kernel function RVM regression time sequential forecasting models good to predicting the outcome of dissolved oxygen DO, permanganate index and ammonia nitrogen.
Because RVM regression model is in the time providing predicted value, also can obtain fiducial interval simultaneously, the credibility that therefore can obtain predicting the outcome, thus provide more reference information for Water quality monitoring and management mechanism.This paper except providing the predicted value and original value of water quality parameter, gives the fiducial interval of statistically the most frequently used degree of confidence 95% in Fig. 2~9.Because water quality parameter value actual value is all greater than zero, so the fiducial interval of water quality parameter degree of confidence 95% need to be removed minus part in the fiducial interval that be 95% in original degree of confidence.Can be found out by Fig. 2~9, RVM regression forecasting time series models all can obtain good prediction effect to four kinds of water quality parameters, and water quality parameter original value all drops in fiducial interval.In addition, if actual monitoring value (being original value), away from predicted value, and exceeds fiducial interval, think and may occur the accident of water pollution, can send if desired early warning information, prompting regulator further checks the reason of change of water quality.
Experiment 2: in order to further illustrate problem, the time series modeling algorithm that the time series modeling algorithm below RVM in the present invention being returned and common SVM return is made comparisons.Specifically counting nRV or support vector from coefficient R, square error MSE, predicted time and interconnection vector counts aspect these four of nSV (among SVM corresponding with interconnection vector be support vector (Support Vector)) and compares.
Table 1~4 have provided difference RVM and SVM returns the time series forecasting comparison to each water quality parameter.
The time series forecasting result comparison of table 1PH value
The time series forecasting result comparison of table 2 dissolved oxygen DO
The time series forecasting result comparison of table 3 permanganate index
The time series forecasting result comparison of table 4 ammonia nitrogen
The time series forecasting result that RVM in contrast table 1 and SVM return is known, for pH value, if adopt same kernel function, the related coefficient of RVM time series predicting model is obviously greater than SVM time series predicting model, and square error, predicted time and interconnection vector (or support vector) number is all obviously less than SVM time series predicting model.And contrast linear kernel function and gaussian kernel function are known, and the prediction effect of linear kernel function is better than gaussian kernel function.From table 2 and 3, can find out, SVM regression time sequential forecasting models is better than RVM to dissolved oxygen DO and permanganate index on square error MSE, related coefficient is more or less the same, but support vector number is but tens times of RVM regression time sequential forecasting models interconnection vector number even hundreds of times.Generally speaking two kinds of time series predicting models are more or less the same to the prediction effect of dissolved oxygen DO and permanganate index.From in table 4, in the time adopting gaussian kernel function, it is good that RVM regression time sequential forecasting models all returns than SVM at square error, related coefficient, working time and interconnection vector the prediction of ammonia nitrogen, in the time adopting linear kernel function, RVM regression time sequential forecasting models other indexs except square error are all better than SVM.
Predicting the outcome of consolidated statement 1~4 can find, RVM, as SVM, has good generalization ability, and two kinds of time series predicting models all can obtain good predicting the outcome.And the related coefficient of RVM time series predicting model is generally all greater than SVM's, interconnection vector number is far less than SVM support vector number, and predicted time is shorter than SVM.
For more fully comparing the performance of RVM time series predicting model and SVM time series predicting model, draw the relative error curve map (but for obtaining comparison diagram clearly, only drawing the relative error curve map of each forecast model of the concentrated front 50 groups of data of test data) of four water quality parameter time series forecastings as shown in Figure 10~13.
As can be seen from Figure 10, predict for pH value, the relative error minimum of linear kernel function RVM regression time sequential forecasting models, the RVM of gaussian kernel function and SVM regression time sequential forecasting models are larger in the relative error of indivedual points, there is no the effective of linear kernel function SVM regression time sequential forecasting models.Figure 11 is known in observation, the relative error minimum of gaussian kernel function RVM regression time sequential forecasting models to dissolved oxygen prediction, gaussian kernel function SVM regression time sequential forecasting models many places occur that relative error is more a little bigger, the relative error of two kinds of kernel function time series predicting models of RVM is more or less the same, but RVM time series predicting model is more stable and do not have error more a little bigger than SVM time series predicting model on the whole.RVM and SVM regression time sequential forecasting models are more or less the same to the relative error of permanganate index prediction as can be seen from Figure 12, but not good to the prediction effect of pH value and dissolved oxygen DO.Can know that from Figure 13 to find out gaussian kernel function and linear kernel function RVM regression time sequential forecasting models all little than gaussian kernel function and linear kernel function SVM regression time sequential forecasting models respectively to the relative error of ammonia nitrogen prediction, and gaussian kernel function SVM time series predicting model occurs that in many places relative error is more a little bigger, has a strong impact on prediction effect.Comprehensive Figure 10~13 can find out that RVM time series predicting model is better than SVM time series predicting model, and the relative error of RVM is less, and performance is more stable, and can provide the probabilistic information of prediction.
Exist support vector number many for SVM water quality parameter time series predicting model, predicted time is long, without problems such as probability outputs, propose herein to adopt RVM to return the method for setting up water quality parameter time series predicting model, and select respectively the RVM regression model of linear kernel function model and gaussian kernel function to predict, from predict the outcome, can find out that original value is all in the fiducial interval of degree of confidence 95%.Compare knownly by returning water quality parameter time series predicting model with the SVM that adopts corresponding kernel function, the precision of prediction of RVM model is not less than SVM model on the whole.What provide because of RVM is the probability distribution of prediction, thus in providing prediction, can also provide the fiducial interval of prediction, thus provide more reference information for Water quality monitoring and management mechanism.In addition, RVM regression model has very strong sparse property, has the advantages such as interconnection vector number is few, predicted time is short, generalization ability is strong.
Specific embodiment described herein is only to the explanation for example of the present invention's spirit.Those skilled in the art can make various amendments or supplement or adopt similar mode to substitute described specific embodiment, but can't depart from spirit of the present invention or surmount the defined scope of appended claims.

Claims (5)

1. the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine, is characterized in that: comprise the following steps:
Step 1: gather water quality parameter historical data and data are carried out to pre-service from Water Automatic Monitoring System, by the missing data completion in historical data, first missing data being done to mend 0 processes, then historical data is done to the pre-service in time domain according to time series, carry out again frequency filtering, finally utilize least square method to carry out best-fit comparison, in the curve obtaining in final matching, find out corresponding observation point and be 0 match value, be the completion value of actual missing data, thereby substitution completion value is by the missing data completion in historical data;
Step 2: using the data through front 2/3 in pretreated water quality parameter historical data as training sample set, rear 1/3 data are as test sample book collection;
Step 3: using the water quality parameter values of some continuous unit interval before training sample set as input, using the water quality parameter value of next unit interval as output, RVM is trained; RVM after training with test sample book set pair tests, the water quality parameter value of some continuous unit interval before test sample book collection is sent into the input end of the RVM after training, and observe the predicted value of the output terminal of this RVM, error between predicted value and next unit interval water quality parameter value of test sample book collection of output terminal meets the requirements of in situation, upcheck, obtain the water quality parameter time series predicting model returning based on RVM;
Step 4: use the time series predicting model returning based on RVM to predict new water quality parameter, send into the input end of forecast model by the water quality parameter value of new front some unit interval, dope the water quality parameter value of next unit interval at its output terminal.
2. the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine according to claim 1, is characterized in that: described water quality parameter adopts pH value, dissolved oxygen content, permanganate index or ammonia-nitrogen content.
3. the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine according to claim 2, is characterized in that: the water quality parameter time series predicting model returning based on RVM in described step 3 is as follows: for the x that is input as of given water quality parameter pH value, dissolved oxygen content, permanganate index or ammonia-nitrogen content *, the prediction average y of corresponding output *and variance be respectively y * = &mu; T &phi; ( x * ) &sigma; * 2 = &sigma; MP 2 + &phi; ( x * ) T &Sigma;&phi; ( x * ) , Prediction output t *obedience average is y *, variance is gaussian distribute, wherein μ Τrepresent posteriority weight average value, be noise variance, prediction is output as
t * = &mu; T &phi; ( x * ) + N ( 0 , &sigma; MP 2 + &phi; ( x * ) T &Sigma;&phi; ( x * ) ) ; It is 1-θ, t that reliability is set *two-sided confidence interval can be obtained by following formula: be p{y **z θ/2<t *<y *+ σ *z θ/2}=1-θ, obtains t *degree of confidence be that the fiducial interval of 1-θ is [y **z θ/2, y *+ σ *z θ/2], upper fractile z θ/2check in by standardized normal distribution table, ask 95% fiducial interval.
4. the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine according to claim 1, it is characterized in that: in described step 3, in the time of verify error, adopt square error MSE (Mean Square Error, MSE), coefficient R (Correlation Coefficient) is as the index of valuation prediction models performance, its computing formula is respectively:
MSE = 1 n &Sigma; i = 1 n ( y ai - y pi ) 2
R = &Sigma; i = 1 n ( y ai - y &OverBar; a ) ( y pi - y &OverBar; p ) &Sigma; i = 1 n ( y ai y &OverBar; a ) 2 &CenterDot; ( y pi - y &OverBar; p ) 2 The estimated performance of the less expression model of square error is better, and the absolute value of related coefficient is more more accurate close to 1 explanation prediction, wherein y aiand y pirepresent respectively actual value and the predicted value of i sample of water quality parameter, with represent respectively n actual value average and the predicted value average of corresponding water quality parameter.
5. the water quality parameter Time Series Forecasting Methods returning based on interconnection vector machine according to claim 4, it is characterized in that: the requirement difference reaching for different water quality parameter errors in described step 3: the square error of the model to pH value prediction is lower than 0.004, the square error of the model to dissolved oxygen content prediction is lower than 0.08, the square error of the model to permanganate index prediction is lower than 0.02, the square error of the model to ammonia-nitrogen content prediction is lower than 0.002, and the related coefficient of the model of above-mentioned prediction is all not less than 0.95.
CN201410196457.4A 2014-05-09 2014-05-09 Water quality parameter time series prediction method based on relevance vector machine regression Active CN103942457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410196457.4A CN103942457B (en) 2014-05-09 2014-05-09 Water quality parameter time series prediction method based on relevance vector machine regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410196457.4A CN103942457B (en) 2014-05-09 2014-05-09 Water quality parameter time series prediction method based on relevance vector machine regression

Publications (2)

Publication Number Publication Date
CN103942457A true CN103942457A (en) 2014-07-23
CN103942457B CN103942457B (en) 2017-04-12

Family

ID=51190125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410196457.4A Active CN103942457B (en) 2014-05-09 2014-05-09 Water quality parameter time series prediction method based on relevance vector machine regression

Country Status (1)

Country Link
CN (1) CN103942457B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318325A (en) * 2014-10-14 2015-01-28 广东省环境监测中心 Multi-basin real-time intelligent water quality predication method and system
CN105676670A (en) * 2014-11-18 2016-06-15 北京翼虎能源科技有限公司 Method and system for processing energy data
CN106156260A (en) * 2015-04-28 2016-11-23 阿里巴巴集团控股有限公司 The method and apparatus that a kind of shortage of data is repaired
CN106872657A (en) * 2017-01-05 2017-06-20 河海大学 A kind of multivariable water quality parameter time series data accident detection method
CN107153874A (en) * 2017-04-11 2017-09-12 中国农业大学 Water quality prediction method and system
CN107392786A (en) * 2017-07-11 2017-11-24 中国矿业大学 Mine fiber grating monitoring system missing data compensation method based on SVMs
CN107480028A (en) * 2017-07-21 2017-12-15 东软集团股份有限公司 The acquisition methods and device of residual time length workable for disk
CN107688871A (en) * 2017-08-18 2018-02-13 中国农业大学 A kind of water quality prediction method and device
CN107977724A (en) * 2016-10-21 2018-05-01 复凌科技(上海)有限公司 A kind of water quality hard measurement Forecasting Methodology of permanganate index
CN108334977A (en) * 2017-12-28 2018-07-27 鲁东大学 Water quality prediction method based on deep learning and system
CN108595892A (en) * 2018-05-11 2018-09-28 南京林业大学 Soft-measuring modeling method based on time difference model
CN108710974A (en) * 2018-05-18 2018-10-26 中国农业大学 A kind of water body ammonia nitrogen prediction technique and device based on depth confidence network
CN108764520A (en) * 2018-04-11 2018-11-06 杭州电子科技大学 A kind of water quality parameter prediction technique based on multilayer circulation neural network and D-S evidence theory
CN108846423A (en) * 2018-05-29 2018-11-20 中国农业大学 Water quality prediction method and system
CN109165247A (en) * 2018-09-30 2019-01-08 中冶华天工程技术有限公司 Sewage measurement data intelligence preprocess method
CN109241607A (en) * 2017-09-27 2019-01-18 山东农业大学 Matching variable fertilising discrete element analysis parameter calibration method based on Method Using Relevance Vector Machine
CN109669017A (en) * 2017-10-17 2019-04-23 中国石油化工股份有限公司 Refinery's distillation tower top based on deep learning cuts water concentration prediction technique
CN109784528A (en) * 2018-12-05 2019-05-21 鲁东大学 Water quality prediction method and device based on time series and support vector regression
CN110182871A (en) * 2019-07-10 2019-08-30 银天远创(厦门)科技有限公司 A kind of method for treating water and terminal based on full-automatic medicine system
CN110245359A (en) * 2018-05-18 2019-09-17 谷歌有限责任公司 Parallel decoding is carried out using autoregression machine learning model
CN110245881A (en) * 2019-07-16 2019-09-17 重庆邮电大学 A kind of water quality prediction method and system of the sewage treatment based on machine learning
CN110838344A (en) * 2019-11-08 2020-02-25 北京理工大学 Water quality data analysis method
CN110889085A (en) * 2019-09-30 2020-03-17 华南师范大学 Intelligent wastewater monitoring method and system based on complex network multiple online regression
CN111080502A (en) * 2019-12-17 2020-04-28 清华苏州环境创新研究院 Big data identification method for abnormal behavior of regional enterprise data
CN111937012A (en) * 2018-03-30 2020-11-13 日本电气方案创新株式会社 Index calculation device, prediction system, progress prediction evaluation method, and program
CN112036082A (en) * 2020-08-27 2020-12-04 东北大学秦皇岛分校 Time series data prediction method based on attention mechanism
CN112182830A (en) * 2019-08-06 2021-01-05 长春工业大学 Water quality parameter prediction method
CN112489402A (en) * 2020-11-27 2021-03-12 罗普特科技集团股份有限公司 Early warning method, device and system for pipe gallery and storage medium
CN113281478A (en) * 2021-04-20 2021-08-20 广州珠水生态环境技术有限公司 Water quality acid-base nature of water resource environmental protection restores and uses monitoring system
CN113449789A (en) * 2021-06-24 2021-09-28 北京市生态环境监测中心 Quality control method for monitoring water quality by full-spectrum water quality monitoring equipment based on big data
CN114340384A (en) * 2019-08-20 2022-04-12 卡塞株式会社 Water quality management device and method for culture pond

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020642B (en) * 2012-10-08 2016-07-13 江苏省环境监测中心 Monitoring water environment Quality Control data analysing method
CN102968573A (en) * 2012-12-14 2013-03-13 哈尔滨工业大学 Online lithium ion battery residual life predicting method based on relevance vector regression
CN103235096A (en) * 2013-04-16 2013-08-07 广州铁路职业技术学院 Sewage water quality detection method and apparatus

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318325A (en) * 2014-10-14 2015-01-28 广东省环境监测中心 Multi-basin real-time intelligent water quality predication method and system
CN104318325B (en) * 2014-10-14 2017-11-07 广东省环境监测中心 Many basin real-time intelligent water quality prediction methods and system
CN105676670B (en) * 2014-11-18 2019-07-19 北京翼虎能源科技有限公司 For handling the method and system of multi-energy data
CN105676670A (en) * 2014-11-18 2016-06-15 北京翼虎能源科技有限公司 Method and system for processing energy data
CN106156260A (en) * 2015-04-28 2016-11-23 阿里巴巴集团控股有限公司 The method and apparatus that a kind of shortage of data is repaired
CN106156260B (en) * 2015-04-28 2020-01-21 阿里巴巴集团控股有限公司 Method and device for repairing missing data
CN107977724A (en) * 2016-10-21 2018-05-01 复凌科技(上海)有限公司 A kind of water quality hard measurement Forecasting Methodology of permanganate index
CN106872657A (en) * 2017-01-05 2017-06-20 河海大学 A kind of multivariable water quality parameter time series data accident detection method
CN107153874A (en) * 2017-04-11 2017-09-12 中国农业大学 Water quality prediction method and system
CN107153874B (en) * 2017-04-11 2019-12-20 中国农业大学 Water quality prediction method and system
CN107392786A (en) * 2017-07-11 2017-11-24 中国矿业大学 Mine fiber grating monitoring system missing data compensation method based on SVMs
CN107480028A (en) * 2017-07-21 2017-12-15 东软集团股份有限公司 The acquisition methods and device of residual time length workable for disk
CN107480028B (en) * 2017-07-21 2020-09-18 东软集团股份有限公司 Method and device for acquiring usable residual time of disk
CN107688871A (en) * 2017-08-18 2018-02-13 中国农业大学 A kind of water quality prediction method and device
CN107688871B (en) * 2017-08-18 2020-08-21 中国农业大学 Water quality prediction method and device
CN109241607A (en) * 2017-09-27 2019-01-18 山东农业大学 Matching variable fertilising discrete element analysis parameter calibration method based on Method Using Relevance Vector Machine
CN109669017B (en) * 2017-10-17 2021-04-27 中国石油化工股份有限公司 Refinery distillation tower top cut water ion concentration prediction method based on deep learning
CN109669017A (en) * 2017-10-17 2019-04-23 中国石油化工股份有限公司 Refinery's distillation tower top based on deep learning cuts water concentration prediction technique
CN108334977A (en) * 2017-12-28 2018-07-27 鲁东大学 Water quality prediction method based on deep learning and system
CN108334977B (en) * 2017-12-28 2020-06-30 鲁东大学 Deep learning-based water quality prediction method and system
CN111937012A (en) * 2018-03-30 2020-11-13 日本电气方案创新株式会社 Index calculation device, prediction system, progress prediction evaluation method, and program
CN108764520A (en) * 2018-04-11 2018-11-06 杭州电子科技大学 A kind of water quality parameter prediction technique based on multilayer circulation neural network and D-S evidence theory
CN108764520B (en) * 2018-04-11 2021-11-16 杭州电子科技大学 Water quality parameter prediction method based on multilayer cyclic neural network and D-S evidence theory
CN108595892A (en) * 2018-05-11 2018-09-28 南京林业大学 Soft-measuring modeling method based on time difference model
CN108710974A (en) * 2018-05-18 2018-10-26 中国农业大学 A kind of water body ammonia nitrogen prediction technique and device based on depth confidence network
CN110245359B (en) * 2018-05-18 2024-01-26 谷歌有限责任公司 Parallel decoding using autoregressive machine learning model
CN110245359A (en) * 2018-05-18 2019-09-17 谷歌有限责任公司 Parallel decoding is carried out using autoregression machine learning model
CN108710974B (en) * 2018-05-18 2020-09-11 中国农业大学 Water ammonia nitrogen prediction method and device based on deep belief network
CN108846423A (en) * 2018-05-29 2018-11-20 中国农业大学 Water quality prediction method and system
CN109165247A (en) * 2018-09-30 2019-01-08 中冶华天工程技术有限公司 Sewage measurement data intelligence preprocess method
CN109165247B (en) * 2018-09-30 2021-07-23 中冶华天工程技术有限公司 Intelligent pretreatment method for sewage measurement data
CN109784528A (en) * 2018-12-05 2019-05-21 鲁东大学 Water quality prediction method and device based on time series and support vector regression
CN110182871A (en) * 2019-07-10 2019-08-30 银天远创(厦门)科技有限公司 A kind of method for treating water and terminal based on full-automatic medicine system
CN110245881A (en) * 2019-07-16 2019-09-17 重庆邮电大学 A kind of water quality prediction method and system of the sewage treatment based on machine learning
CN112182830B (en) * 2019-08-06 2022-10-18 长春工业大学 Water quality parameter prediction method
CN112182830A (en) * 2019-08-06 2021-01-05 长春工业大学 Water quality parameter prediction method
CN114340384A (en) * 2019-08-20 2022-04-12 卡塞株式会社 Water quality management device and method for culture pond
CN114340384B (en) * 2019-08-20 2023-09-26 卡塞株式会社 Water quality management device and method for culture pond
CN110889085A (en) * 2019-09-30 2020-03-17 华南师范大学 Intelligent wastewater monitoring method and system based on complex network multiple online regression
CN110838344B (en) * 2019-11-08 2023-04-07 北京理工大学 Water quality data analysis method
CN110838344A (en) * 2019-11-08 2020-02-25 北京理工大学 Water quality data analysis method
CN111080502A (en) * 2019-12-17 2020-04-28 清华苏州环境创新研究院 Big data identification method for abnormal behavior of regional enterprise data
CN111080502B (en) * 2019-12-17 2023-09-08 清华苏州环境创新研究院 Big data identification method for regional enterprise data abnormal behaviors
CN112036082B (en) * 2020-08-27 2022-03-08 东北大学秦皇岛分校 Time series data prediction method based on attention mechanism
CN112036082A (en) * 2020-08-27 2020-12-04 东北大学秦皇岛分校 Time series data prediction method based on attention mechanism
CN112489402A (en) * 2020-11-27 2021-03-12 罗普特科技集团股份有限公司 Early warning method, device and system for pipe gallery and storage medium
CN113281478A (en) * 2021-04-20 2021-08-20 广州珠水生态环境技术有限公司 Water quality acid-base nature of water resource environmental protection restores and uses monitoring system
CN113449789A (en) * 2021-06-24 2021-09-28 北京市生态环境监测中心 Quality control method for monitoring water quality by full-spectrum water quality monitoring equipment based on big data

Also Published As

Publication number Publication date
CN103942457B (en) 2017-04-12

Similar Documents

Publication Publication Date Title
CN103942457A (en) Water quality parameter time series prediction method based on relevance vector machine regression
Sun et al. Using Bayesian deep learning to capture uncertainty for residential net load forecasting
US10290066B2 (en) Method and device for modeling a long-time-scale photovoltaic output time sequence
Liu et al. Coupling the k-nearest neighbor procedure with the Kalman filter for real-time updating of the hydraulic model in flood forecasting
CN105391083B (en) Wind power interval short term prediction method based on variation mode decomposition and Method Using Relevance Vector Machine
CN101587155B (en) Oil soaked transformer fault diagnosis method
CN102185735B (en) Network security situation prediction method
CN105376097A (en) Hybrid prediction method for network traffic
Heng et al. Probabilistic and deterministic wind speed forecasting based on non-parametric approaches and wind characteristics information
CN105335756A (en) Robust learning model and image classification system
CN103226595B (en) The clustering method of the high dimensional data of common factor analyzer is mixed based on Bayes
CN106203723A (en) Wind power short-term interval prediction method based on RT reconstruct EEMD RVM built-up pattern
Moeini et al. Fitting the three-parameter Weibull distribution with Cross Entropy
CN111625516A (en) Method and device for detecting data state, computer equipment and storage medium
CN103235096A (en) Sewage water quality detection method and apparatus
CN107798426A (en) Wind power interval Forecasting Methodology based on Atomic Decomposition and interactive fuzzy satisfying method
CN103617259A (en) Matrix decomposition recommendation method based on Bayesian probability with social relations and project content
Feng et al. Improved prediction model for flood-season rainfall based on a nonlinear dynamics-statistic combined method
Hu et al. Uncertainty assessment of estimation of hydrological design values
Kumar Singh et al. Estimation and prediction for Type-I hybrid censored data from generalized Lindley distribution
CN105808962A (en) Assessment method considering voltage probabilities of multiple electric power systems with wind power output randomness
CN104795063A (en) Acoustic model building method based on nonlinear manifold structure of acoustic space
CN111311026A (en) Runoff nonlinear prediction method considering data characteristics, model and correction
Irofti et al. Fault handling in large water networks with online dictionary learning
Williams et al. Importance nested sampling with normalising flows

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant