CN107463993A - Medium-and Long-Term Runoff Forecasting method based on mutual information core principle component analysis Elman networks - Google Patents

Medium-and Long-Term Runoff Forecasting method based on mutual information core principle component analysis Elman networks Download PDF

Info

Publication number
CN107463993A
CN107463993A CN201710662894.4A CN201710662894A CN107463993A CN 107463993 A CN107463993 A CN 107463993A CN 201710662894 A CN201710662894 A CN 201710662894A CN 107463993 A CN107463993 A CN 107463993A
Authority
CN
China
Prior art keywords
mrow
msub
mutual information
forecast
component analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710662894.4A
Other languages
Chinese (zh)
Other versions
CN107463993B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
He Zhiyao
Original Assignee
He Zhiyao
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by He Zhiyao filed Critical He Zhiyao
Priority to CN201710662894.4A priority Critical patent/CN107463993B/en
Publication of CN107463993A publication Critical patent/CN107463993A/en
Application granted granted Critical
Publication of CN107463993B publication Critical patent/CN107463993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

The invention discloses a kind of medium-term and long-term runoff DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM method based on mutual information, core principle component analysis and Elman networks, comprise the following steps:Meteorological model data is collected, establishes index time series and flow-through period sequence one-to-one relationship;Notable and high standard mutual information index is selected using the method for standard mutual information;With the principal component of the achievement data of core principle component analysis method extraction screening;Build Elamn neural network models;After z score standardization principal components, therefrom mark off training sample and network is exercised supervision training, mark off test samples and network is tested, calculate each evaluation index value;It is multiple to repeat single forecast, takes the ensemble average repeatedly forecast to make final predicted value.The present invention can fully excavate linear, non-linear relation between meteorological model data and runoff, and establish numerical relationship model, realize the forecast that centering long-period runoff amount is more accurate, stable.

Description

Medium-and Long-Term Runoff Forecasting based on mutual information-core principle component analysis-Elman networks Method
Technical field
It is more particularly to a kind of to be based on mutual information-core principle component analysis-Elman networks the invention belongs to areas of information technology Medium-and Long-Term Runoff Forecasting method.
Background technology
Accurately medium-term and long-term run-off forecast refers to the weight for leading water resources integrative planning, scientific management and Optimized Operation Want premise.
At present, the method for conventional Medium-and Long-Term Runoff Forecasting is to be based on statistical method, i.e., by finding Forecasting Object With the statistical relationship of predictor, forecast is realized.Statistical method is used for the key issue of Medium-and Long-Term Runoff Forecasting to be included Three aspect below:
(1) primary election of predictor:Primary election for predictor, currently used method are Linear correlative analysis methods (such as Pearson correlation analyses, Spearman correlation analyses), i.e., by calculate meteorological model data (including distant correlation factor, Locally associated factor etc.) coefficient correlation between history Inflow Sequence, the high factor of coefficient correlation is selected as predictor;
(2) noise reduction and de-redundancy for the factor come are selected:Noise reduction and de-redundancy for selecting the factor come, at present Conventional method is PCA (Principal Component Analysis, PCA).Due to correlation analysis side When method screens the factor, the factor of the high correlation filtered out is often more, and exists between different factor time serieses higher Multi-collinearity, there is also certain noise in itself for factor time series.Therefore, it is necessary to being dropped to selecting the factor come Make an uproar and de-redundancy.PCA can be with several less overall targets come instead of original more index, and these less overall targets It can not only reflect more originally compared with the useful information of multi objective to the greatest extent, and it is orthogonal between each other;
(3) foundation of the optimal mathematical relationship between Forecasting Object and predictor:For Forecasting Object and predictor Between optimal mathematical relationship foundation, currently used model has multiple regression, random forest, artificial neural network, support Vector machine etc..
The problem of following three aspect be present based on statistical Medium-and Long-Term Runoff Forecasting method in existing:
(1) hydrologic process is complicated, necessarily non-thread between predictor and Forecasting Object in addition to linear relationship also be present Sexual intercourse.Linear correlative analysis method for predictor primary election can only describe the linear relationship between variable, it is impossible to reflect variable Between non-linear relation;
(2) it is used for the PCA of primary election factor noise reduction and de-redundancy, is substantially a kind of Linear Mapping method, obtained principal component It is to be generated by Linear Mapping.This method have ignored the correlation for being higher than 2 ranks between data, so the principal component extracted is simultaneously It is not optimal;
(3) it is used for the model for establishing optimal mathematical relationship between Forecasting Object and predictor, conventional multiple regression is real It is also a kind of linear fit on border, it is impossible to the non-linear relation reflected between Forecasting Object and predictor.With other model phases Than, artificial neural network because robustness is good, Nonlinear Mapping and self-learning capability are strong, obtained in Medium-and Long-Term Runoff Forecasting compared with To be widely applied, but the uncertainty of neural network model parameter can affect to accuracy of the forecast, every time There can be the difference of certain amplitude between the result of forecast.
The content of the invention
The purpose of the present invention is to be directed to problem present in traditional statistical method, there is provided one kind can overcome these The method of the Medium-and Long-Term Runoff Forecasting of problem, so as to improve the stability of forecast and precision.
It is provided by the invention based on standard mutual information (Normalized Mutual Information, NMI), core it is main into Analysis (Kernel Principal Component Analysis, KPCA) and Elman neutral nets (Elman Neural Network Medium-and Long-Term Runoff Forecasting method), specifically includes following steps:
Step 1:Data prediction
1.1 collect regional history footpath flow datas to be predicted and have can be as the meteorological model data of predictor, often Meteorological model data includes index, the National Climate centers such as Atmospheric Circulation Characteristics, high-altitude field of pressure and sea surface temperature and carried The 74 Circulation Features indexes or new 130 atmospheric monitoring indexes supplied essentially comprising these conventional indexs, can be direct Preliminary predictor is selected from these index numbers.
1.2 have hysteresis quality in view of influence of the meteorological factor to runoff, and the index time series before foundation in 1 year is with treating Predict the one-to-one relationship of regional flow-through period sequence.For example, having selected 130 indexs, Forecasting Object is the year footpath of 2017 Stream, the history footpath flow datas and history achievement data of existing 1960-2016 month by month.It is corresponding with runoff with one of index Relation illustrates, and other indexs are identical with runoff corresponding relation.Corresponding relation is as follows:
Certain the index time series of table 1 and annual flow time series corresponding relation
Step 2:Predictor primary election based on standard mutual information
Index time series and annual flow time series are divided into two parts, training of the part as neural networks by 2.1 Sample, test samples of the another part as trained neutral net.For example, the data of former 50 years are as training sample This, the data of latter 5 years are as test samples.
2.2 calculate mutual information.To training sample data, when calculating each index time series respectively with corresponding runoff Between sequence mutual information.With the data instance in table 1, i.e., the mutual information that the row of computation sheet the 1st respectively arrange with form residue.Mutual information MI calculation formula is as follows:
Wherein, X is flow-through period sequence, X=(x1,x2,x3...xn)T, Y be index time series, Y=(y1,y2, y3...yn)T, molecule p (xi,yj) be X and Y joint distribution principle, p (xi)、p(yj) be respectively X and Y edge distribution rule.
2.3 calculate standard mutual information.Normalised mutual information, i.e., do denominator with entropy and the MI values of step 2.2 are mapped to 0 and 1 Between.Standard mutual information NMI calculation formula is as follows:
Wherein, H (X) and H (Y) is respectively X and Y entropy, and H (X) and H (Y) calculation formula are as follows:
The significance test (Significance Test) of 2.4 standard mutual informations.Standard mutual information is carried out using boot strap Inspection, comprise the following steps:
2.4.1 the standard mutual information NMI values of former flow-through period sequence and index time series are calculated;
2.4.2 random order K times (typically taking 100 times) for upsetting two corresponding time serieses simultaneously, calculates out of order rear NMI values And arranged by descending order;
2.4.3 take order arrangement NMI probability quantile as to should probability significance NMI threshold values;
2.4.4 if former time series NMI values are more than NMI values corresponding to certain probability threshold value (typically taking 95%), then it is assumed that this Two groups of data are significantly correlated.
2.5 select and are more than a certain threshold value by significance test and standard mutual information and (typically take 0.9, but according to time sequence The difference of row length can be variant, can voluntarily adjust) predictor of the index as primary election.
Step 3:Core principle component analysis is carried out, extracts principal component
3.1 standardize the predictor data z-score of primary election, and calculation formula is as follows:
In formula, y*Data after being standardized for z-score, y are one in the predictor data of primary election, and μ is y institutes The average of the time series at place, σ are the standard deviation of the time series residing for y.
The nuclear matrix K of the predictor of primary election in 3.2 calculation procedures 2.5.K is n × n matrix, the member that the i-th row jth arranges Plain Ki,jCalculation formula it is as follows:
In formula,It is column vector, represents the time sequence after the predictor z-score standardization of different primary election Row, k is kernel function, and conventional kernel function has following several:
1. linear kernel (Linear Kernel):
2. polynomial kernel (Polynomial Kernel):
3. Radial basis kernel function (Radial Basis Function):
4. Sigmoid cores (Sigmoid Kernel):
Formula (8), (9) and b, c, p, δ, υ, ξ in (10) are constant, are the parameters of various kernel functions.
3.3 calculate the nuclear matrix of centralization.Nuclear matrix K after centralizationcRepresent, KcFor n × n matrix, KcMeter It is as follows to calculate formula:
Kc=K-JK-KJ+JKJ (11)
J is n × n matrix in formula (11), and J form is as follows:
3.4 calculate the nuclear matrix K after centralizationcEigen vector, and characteristic value according to descending Order arranges, and the order of characteristic vector does corresponding adjustment according to characteristic value.The eigenvalue matrix obtained after sequence is Λ, feature Vector matrix is U, is represented as follows:
3.5 calculate normalized eigenvectors matrix A, and A form is as follows:
Wherein
3.6 extraction principal components, principal component matrix are n × n square formation.Before general extraction 2 to 3 principal components as forecast because Son.The calculation formula of i-th of principal component is as follows:
KPC in formulai=(kpci1,kpci2,...,kpcin), KCThe nuclear matrix for the centralization being calculated for step 3.2.
Step 4:Build Elman neural network models
4.1 structure Elman network models, need to determine network structure (i.e. the nodes of each layer of network) first.Elman networks Structure chart is shown in Figure of description 2.The method for determining each node layer of network is as follows:The node number of input layer (Input Layer) Equal to the number of predictive factor;Output layer (Output Layer) nodes are equal to the number of Forecasting Object;Accept layer (Context Layer) nodes are equal to hidden layer (Hidden Layer) nodes;Node in hidden layer is for the general of network Change performance to have a major impact, but there is presently no the method for a system and standard to determine node in hidden layer.One ratio Preferably selection is exactly trial-and-error method, i.e., by using different node in hidden layer, observes the value of forecasting of network, so that it is determined that The number of hidden layer node.
4.2 structure Elman network models, it is also necessary to determine the training algorithm of network.The present invention uses back-propagation algorithm With the weights with momentum term and the adaptive gradient descent algorithm more row network of learning rate.Right value update formula is as follows:
In formula, E is cost function (Cost Function), and the present invention uses mean square error function (Mean Squared Error,MSE).ω be Elman neutral nets weight matrix, Δ ωkThe change of weights when being updated for kth time Amount, η are learning rate (Learning Rate), and α is momentum constant (Momentum Constant), 0≤α < 1, α of the present invention =0.9.The more new formula of learning rate is as follows during for each iteration:
η (k)=η (k-1) (1+ccos θ) (16)
In formula, c is constant, and the present invention takes 0.2.θ is most speed descent directionWith last weights knots modification Δ ωk-1Between angle.
Step 5:The single model forecast of run-off
5.1 according to the principal component factor sequence and regional history footpath to be predicted that described in step 2.1, step 3.5 is extracted Sequence normalization is flowed, is then divided into training sample and test samples, normalization formula is as follows:
Wherein, z be normalization after data, zmax=1, zmin=-1, z ∈ [- 1,1], q are original Inflow Sequence or master One in components series, qminFor the minimum value in the sequence where q, qmaxFor the maximum in the sequence where q.
5.2 input using the factor data in training sample as network, the history footpath flow data conduct in training sample The output of network, the learning training for having supervision is carried out to network.
5.3 pairs of networks after training, by the use of the factor in test samples as the input of network, the prediction effect to network Fruit is tested.By the result renormalization of inspection, the footpath flow valuve predicted.
5.4 with average absolute percent error ((Mean Absolute Percentage Error, MAPE), relative error (Relative Error, RE), maximum relative error (Maximum Relative Error, MRE), qualification rate (Qualified Rate, QR) it is the evaluation index forecast, the calculation formula of each index is as follows:
In formula (18), (19) and (20)For the footpath flow valuve of probative term prediction, xiFor corresponding actual footpath flow valuve, j is inspection The number of samples tested.
In formula (21), TQualFor qualified forecast number, TtotalFor total forecast number.According to《Hydrological Information and Forecasting is advised Model》(GI3/T22482-2008) scheme of evaluation Medium-and Long-Term Runoff Forecasting precision in, maximum relative error of the present invention to forecast Forecast less than 20% is qualified forecast.
Step 6:The DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM of run-off
In step 4, the present invention is realized to decline the gradient in Elman network weights space with back-propagation algorithm and searched Rope, iteratively reduce the error between the actual value of history footpath flow data and the predicted value of network.But error surface may contain Multiple different local minimums, during the gradient descent search to Elman network weights space, office may be rested on Portion's minimum point, and it is not necessarily global minimizer.Therefore, even if the structure phase of each Elman networks after training Together, but the connection weight parameter of model is also different, and this causes to deposit between each single Elman network models prediction result In difference.In order to reduce the deviation of this prediction result caused by model parameter uncertainty, the present invention is repeatedly carried out The single model forecast of run-off, using the average value of multiple forecast result as final forecast result.
Compared with prior art, the advantage of the invention is that:
(1) the first choosing method of the predictor based on standard mutual information, the linear relationship that can not only reflect between variable, also Non-linear relation between energy response variable, the factor of selection is more representative, overcomes traditional based on Linear correlative analysis The shortcomings that screening the method for the factor;
(2) core principle component analysis method (KPCA) is PCA (PCA) nonlinear extensions, that is, passes through mapping function Original vector is mapped to high-dimensional feature space F by Φ, and PCA analyses are carried out on F.The data of linearly inseparable in luv space In high-dimensional feature space nearly all can linear separability, be now PCA in higher dimensional space, the principal component of extraction, which more all has, to be represented Property.Therefore the feature extracting method based on KPCA substantially increases the disposal ability of nonlinear data, with traditional based on PCA's Feature extracting method is compared, advantageously.In addition, it is mutually orthogonal between the principal component through KPAC extractions, and data have passed through drop Make an uproar de-redundancy, can be good at preventing the over-fitting of neutral net, improve the generalization ability of network;
(3) artificial neural network robustness is good, Nonlinear Mapping and self-learning capability are strong, can be good at excavating forecast because Inner link between son and Forecasting Object.The Elman neutral nets that the present invention selects, it is a kind of typical dynamic regression net Network, compared with conventional feedforward neural network (such as BP neural network), add undertaking layer more.Accept layer and be able to record one The information of secondary network iteration and as the input of current iteration, this causes Elman networks to be more suitable for the prediction of time series data. In addition, neutral net has the uncertain problem of parameter, in order to reduce the uncertainty of forecast, it is pre- to employ multi-model set The method of report, improve forecast precision;
(4) it is used for NMI, the KPCA for Principle component extraction of factor primary election in the present invention with being used for Runoff Forecast Elman neutral nets all have the disposal ability to linear nonlinear data in addition, and three kinds of Combination of Methods together, can overcome The limitation of conventional method, improve the stability and accuracy of forecast.
Brief description of the drawings
Fig. 1 is the overview flow chart of the present invention;
Fig. 2 is the structure chart of Elman networks.
Embodiment
With reference to the accompanying drawings and examples, the present invention is further elaborated.
Fig. 1 is the overall flow figure of the present invention.By taking the forecast of Jinping Hydroelectric Power Station reservoir annual mean runoff as an example, press According to flow chart, six steps can be divided into, step is as follows:
Step 1:Data prediction
1.1 collect regional history footpath flow datas to be predicted and have can be as the meteorological model data of predictor, often Meteorological model data includes the indexs such as Atmospheric Circulation Characteristics, high-altitude field of pressure and sea surface temperature.What the present embodiment used Data information include Jinping Hydroelectric Power Station reservoir range data of annual mean runoff year by year of 1960~2011 years and 1959~ 74 circulation characteristic data month by month of 2010.
1.2 due to being that annual mean runoff is forecast, therefore the factor can not select out of the same period time then, meanwhile, Hysteresis quality be present in view of influence of the meteorological factor to runoff, so, according to table 1, establish Jinping Hydroelectric Power Station year by year (1960 ~2011 years) one-to-one corresponding of annual mean runoff and the 74 atmospheric circulation indexes of the previous year (1959~2010 years) month by month closes System.The corresponding relation such as table 2 of wherein a certain item atmospheric circulation exponential time sequence and flow-through period sequence, other indexs are similar.
The corresponding relation of certain the atmospheric circulation exponential time sequence of table 2 and flow-through period sequence
Step 2:Predictor primary election based on standard mutual information
Index time series and annual flow time series are divided into two parts, training of the part as neural networks by 2.1 Sample, test samples of the another part as trained neutral net.The data of 47 years are used as training before the present embodiment Sample, the data of latter 5 years are as test samples.
2.2 calculate mutual information MI.To training sample data, calculate respectively the time series of each index month by month with it is corresponding Flow-through period sequence mutual information.For the present embodiment, i.e., according to the mean annual runoff sequence of the 1st row in formula (1) computational chart 2 Mutual information in row and table between the remaining index time series respectively arranged.It is worth noting that, for the reliability of test effect, Mutual information only is calculated using training sample data, so as to screen preliminary predictor.Test samples data should not add.
2.3 normalized mutual information NMI, i.e., the MI values that step 2.2 is calculated are mapped to 0 with (2), (3) and (4) Between 1.
The significance test (Significance Test) of 2.4 mutual informations.The present embodiment carries out mutual information using boot strap Inspection, comprise the following steps:
2.4.1 the standard mutual information NMI values of former flow-through period sequence and index time series are calculated;
2.4.2 upset the order 100 times of two time serieses at random, calculate it is out of order after NMI values simultaneously by descending order row Row;
2.4.3 take order arrangement NMI probability quantile as to should probability significance NMI threshold values;
If 2.4.4 former time series NMI values are more than NMI values corresponding to certain probability threshold value (the present embodiment takes 95%), recognize It is significantly correlated for this two groups of data.
2.5 select the index work for being more than a certain threshold value (the present embodiment takes 0.9) by significance test and standard mutual information For the predictor of primary election.In the present embodiment, index of the standard mutual information more than 0.9 has 205, and the information of preceding 20 indexs is such as Under:
The predictor of 3 preceding 20 primary election of table
The factor of primary election NMI MI
August sunspot 0.988375 5.426929
April sunspot 0.988375 5.426929
July sunspot 0.988375 5.426929
October sunspot 0.988375 5.426929
December sunspot 0.988375 5.426929
2 months sunspots 0.98444 5.384376
September sunspot 0.98444 5.384376
November sunspot 0.98444 5.384376
January sunspot 0.98444 5.384376
March sunspot 0.98444 5.384376
May sunspot 0.98444 5.384376
August Northern Hemisphere pair high intensity index (5E-360) 0.980474 5.341823
The Northern Hemisphere in March pole whirlpool area index (5th area, 0-360) 0.980474 5.341823
Atlantic Ocean North America, north African in June pair high intensity index (110W-60E) 0.976477 5.299270
Northern Hemisphere pair high intensity index in June (5E-360) 0.976291 5.256717
Northern Hemisphere pair high intensity index in April (5E-360) 0.972448 5.256717
Atlantic Ocean North America, north African in July pair high intensity index (110W-60E) 0.972448 5.256717
Atlantic Ocean North America, September north African pair high intensity index (110W-60E) 0.972448 5.256717
June sunspot 0.972448 5.256717
Pacific Subtropical High intensity index in June (110E-115W) 0.970919 5.240655
Step 3:Core principle component analysis is carried out, selects principal component as predictor.This example have selected in step 2.5 205 factor sequences, multicollinearity often be present between these factor sequences.Predictor with multicollinearity can be made Weight matrix into neutral net increases, and the information and noise repeated can directly affect the training speed of neutral net and extensive Ability, it is therefore desirable to carry out feature extraction, noise reduction de-redundancy.This example is from Radial basis kernel function as core principle component analysis Kernel function, principal component is calculated according to formula (5), (6), (9), (11), (12), (13) and (14), obtained principal component is according to side The order that the value of poor contribution rate is descending arranges, the variance contribution ratio such as table 4 of preceding 5 principal components of extraction, corresponding first 5 The data such as table 5 of main stor(e)y point.
The variance contribution ratio of 4 preceding 5 principal components of table
Principal component Principal component _ 1 Principal component _ 2 Principal component _ 3 Principal component _ 4 Principal component _ 5
Variance contribution ratio 25.7% 6.9% 5.6% 5.1% 3.9%
Preceding 5 principal components of the KPCA of table 5 extractions
In the present embodiment, determine to select which principal component as predictor using trial-and-error method.Sent out by repetition test Existing, when from the first two principal component as predictor, the value of forecasting of probative term is best, final to determine that predictor is selected The first two principal component.It is worth noting that, in order to which standard used when training sample and test samples extract principal component is consistent , it is necessary to KPCA will be carried out together with training sample sequence and test samples combined sequence.In the present embodiment, training sample sequence Length be 47, the length of test samples sequence is 5, and the length of sequence samples and test samples combined sequence is 52, therefore, table The sequence length for the principal component extracted in 4 is 52.
Step 4:Build Elman neural network models
4.1 structure Elman network models, need to determine network structure (i.e. the nodes of each layer of network) first.Determine network The method of each node layer is as follows:
(1) node number of input layer (Input Layer) is equal to the number of predictive factor.The present embodiment has selected the first two Principal component is as predictor, and therefore, Elman neural network input layers nodes are 2;
(2) output layer nodes are equal to the number of Forecasting Object, and the present embodiment is pre- to annual mean runoff progress monodrome Report, therefore output layer node number is 1;
(3) accept node layer number and be equal to node in hidden layer;
(4) node in hidden layer has a major impact for the Generalization Capability of network, but there is presently no a system and The method of standard determines node in hidden layer.One relatively good selection is exactly trial-and-error method, i.e., is implied by using different Node layer number, the value of forecasting of network is observed, so that it is determined that the number of hidden layer node.In the present embodiment, because early stage is used KPCA has carried out noise reduction, de-redundancy to factor data, and orthogonal between obtained principal component, can effectively prevent nerve excessively The over-fitting of network, so, when node in hidden layer is respectively 3,4,5,6,7,8,9,10,11,12,13 and 15, probative term For the relative error of interior forecast all within 20%, network is very stable, has good generalization ability.By testing repeatedly, when hidden When number containing node layer is 10, the maximum relative error of probative term forecast falls below 15%, it is thus determined that node in hidden layer is 10.
4.2 build Elman network models, it is also necessary to determine the training algorithm of network.The present embodiment is calculated using backpropagation Method and the weights with momentum term and the adaptive gradient descent algorithm more row network of learning rate.Right value update formula see formula (15) and Formula (16).
Step 5:The single model forecast of run-off
5.1 according to the principal component factor sequence and regional history footpath to be predicted that described in step 2.1, step 3.5 is extracted Flow sequence to normalize according to formula (1), be then divided into training sample and test samples.In the present embodiment, two step 3 selected The data of 47 years are as training sample, the number of latter 5 years before individual chief composition series and Jinping Hydroelectric Power Station mean annual runoff sequence According to as test samples.
5.2 input using the factor data in training sample as network, the history footpath flow data conduct in training sample The output of network, the learning training for having supervision is carried out to network.Learning process can be summarized as follows:
(1) using the connection weight coefficient between random function initialization each layer of network, and cost function (Cost is determined Function) the error ε allowed.The present embodiment cost function using mean square error function (Mean Squared Error, MSE);
(2) to network inputs learning sample, combination algorithm calculates the value E of mean square error function, and each according to E renewal networks Connection weight between layer;
(3) when E value is more than ε, step (2) is gone to, otherwise study terminates, calculating network output.
5.3 pairs of networks after training, by the use of the factor data in test samples as the input of network, to the pre- of network Effect is surveyed to test.By the result renormalization of inspection, the footpath flow valuve predicted.
5.4 ((Mean Absolute Percentage Error, MAPE), are missed greatly relatively with average absolute percent error Poor (Maximum Relative Error, MRE), qualification rate (Qualified Rate, QR) are the evaluation index of forecast, are respectively referred to Mark calculates according to formula (17), (18), (19) and (20).In order to verify the generalization ability of network model and forecast in the present invention Stability, the present embodiment have carried out 100 single model forecast, as a result found, the maximum relative error of forecast in each probative term All within 16%, qualification rate has reached 100%.Illustrate the network model used in the present invention have good generalization ability and Forecast stability.The error statistics such as table 6 of the forecast of wherein preceding 5 probative terms.
The single model probative term prediction error of table 6 counts
Step 6:The DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM of run-off
In order to reduce the deviation of the prediction result caused by model parameter uncertainty, the present invention repeatedly carries out runoff The single model forecast of amount, using the average value of multiple forecast result as final forecast result., can be by 100 in the present embodiment The average value of the result of secondary forecast is as final forecast result.
Embodiments of the invention is the foregoing is only, is not intended to limit the invention.All principles in the present invention Within, the equivalent substitution made should be included in the scope of the protection.The content category that the present invention is not elaborated In prior art known to this professional domain technical staff.

Claims (5)

  1. A kind of 1. Medium-and Long-Term Runoff Forecasting method based on mutual information-core principle component analysis-Elman networks, it is characterised in that should Method includes the predictor primary election based on mutual information;Principal component is extracted with core principle component analysis;Build Elman neutral nets Model;The single model forecast of run-off;The multi-model DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM of run-off.
  2. 2. the Medium-and Long-Term Runoff Forecasting method as claimed in claim 1 based on mutual information-core principle component analysis-Elman networks, Characterized in that, the predictor primary election based on mutual information comprises the steps of:
    (1) mutual information MI of each index time series with corresponding flow-through period sequence is calculated:
    <mrow> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>
    In formula, X is flow-through period sequence, X=(x1,x2,x3...xn)T, Y be index time series, Y=(y1,y2, y3...yn)T, molecule p (xi,yj) be X and Y joint distribution principle, p (xi)、p(yj) be respectively X and Y edge distribution rule;
    (2) do denominator with entropy MI values are mapped between 0 and 1, obtain standard mutual information NMI:
    <mrow> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>2</mn> <mfrac> <mrow> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>H</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>H</mi> <mrow> <mo>(</mo> <mi>Y</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
    <mrow> <mi>H</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow>
    Wherein, H (X) and H (Y) is respectively X and Y entropy, and H (Y) is similar to H (X) calculation formula;
    (3) inspection of standard mutual information is carried out using boot strap.
  3. 3. the Medium-and Long-Term Runoff Forecasting method as claimed in claim 1 based on mutual information-core principle component analysis-Elman networks, Characterized in that, comprised the steps of with core principle component analysis extraction principal component:
    (1) the predictor data z-score standardization of primary election;
    (2) the nuclear matrix K of the predictor of primary election;
    (3) the nuclear matrix K of centralization is calculatedc, and its eigen vector is calculated, characteristic value according to descending Order arranges, and the order of characteristic vector does corresponding adjustment according to characteristic value;
    (4) normalized eigenvectors matrix A is calculated, and calculates the nuclear matrix K of centralizationcProjection in characteristic vector, is obtained Principal component.
  4. 4. the Medium-and Long-Term Runoff Forecasting method as claimed in claim 1 based on mutual information-core principle component analysis-Elman networks, Characterized in that, the single model forecast of run-off comprises the steps of:
    (1) Elman network models are used, make single model forecast to run-off.
  5. 5. the Medium-and Long-Term Runoff Forecasting method as claimed in claim 1 based on mutual information-core principle component analysis-Elman networks, Characterized in that, the multi-model DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM of run-off comprises the steps of:
    (1) Elman network models are used, multiple single model forecast is done to run-off;
    (2) using the result average value repeatedly forecast as last forecast result.
CN201710662894.4A 2017-08-04 2017-08-04 Medium-and-long-term runoff forecasting method based on mutual information-kernel principal component analysis-Elman network Active CN107463993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710662894.4A CN107463993B (en) 2017-08-04 2017-08-04 Medium-and-long-term runoff forecasting method based on mutual information-kernel principal component analysis-Elman network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710662894.4A CN107463993B (en) 2017-08-04 2017-08-04 Medium-and-long-term runoff forecasting method based on mutual information-kernel principal component analysis-Elman network

Publications (2)

Publication Number Publication Date
CN107463993A true CN107463993A (en) 2017-12-12
CN107463993B CN107463993B (en) 2020-11-24

Family

ID=60547269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710662894.4A Active CN107463993B (en) 2017-08-04 2017-08-04 Medium-and-long-term runoff forecasting method based on mutual information-kernel principal component analysis-Elman network

Country Status (1)

Country Link
CN (1) CN107463993B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537679A (en) * 2018-02-08 2018-09-14 中国农业大学 The regional scale crop emergence date evaluation method that remote sensing is merged with crop modeling
CN109492825A (en) * 2018-11-26 2019-03-19 中国水利水电科学研究院 Medium-long Term Prediction method based on mutual information and the principal component analysis screening factor
CN109671507A (en) * 2018-12-24 2019-04-23 万达信息股份有限公司 A kind of obstetrics' disease that calls for specialized treatment coupling index method for digging based on Electronic Health Record
CN110334546A (en) * 2019-07-08 2019-10-15 辽宁工业大学 Difference privacy high dimensional data based on principal component analysis optimization issues guard method
CN110852497A (en) * 2019-10-30 2020-02-28 南京智慧航空研究院有限公司 Scene variable slide-out time prediction system based on big data deep learning
CN111310968A (en) * 2019-12-20 2020-06-19 西安电子科技大学 LSTM neural network circulation hydrological forecasting method based on mutual information
CN111445085A (en) * 2020-04-13 2020-07-24 中国水利水电科学研究院 Medium-and-long-term runoff forecasting method considering influence of medium-and-large-sized reservoir engineering water storage
CN112766531A (en) * 2019-11-06 2021-05-07 中国科学院国家空间科学中心 Runoff prediction system and method based on satellite microwave observation data
CN117114523A (en) * 2023-10-23 2023-11-24 长江三峡集团实业发展(北京)有限公司 Runoff forecasting model construction and runoff forecasting method based on condition mutual information
CN117132176A (en) * 2023-10-23 2023-11-28 长江三峡集团实业发展(北京)有限公司 Runoff forecasting model construction and runoff forecasting method based on forecasting factor screening

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080108881A1 (en) * 2004-07-10 2008-05-08 Steven Elliot Stupp Apparatus for aggregating individuals based on association variables
CN102122370A (en) * 2011-03-07 2011-07-13 北京师范大学 Method for predicting river basin climatic change and analyzing tendency
CN104008164A (en) * 2014-05-29 2014-08-27 华东师范大学 Generalized regression neural network based short-term diarrhea multi-step prediction method
CN104091074A (en) * 2014-07-12 2014-10-08 西安浐灞生态区管理委员会 Medium and long term hydrologic forecasting method based on empirical mode decomposition
CN104463358A (en) * 2014-11-28 2015-03-25 大连理工大学 Small hydropower station power generation capacity predicating method combining coupling partial mutual information and CFS ensemble forecast
CN104869126A (en) * 2015-06-19 2015-08-26 中国人民解放军61599部队计算所 Network intrusion anomaly detection method
CN104951847A (en) * 2014-12-31 2015-09-30 广西师范学院 Rainfall forecast method based on kernel principal component analysis and gene expression programming
CN105139093A (en) * 2015-09-07 2015-12-09 河海大学 Method for forecasting flood based on Boosting algorithm and support vector machine
CN105354416A (en) * 2015-10-26 2016-02-24 南京南瑞集团公司 Representative power station based basin rainfall runoff power macro-forecasting method
CN105678422A (en) * 2016-01-11 2016-06-15 广东工业大学 Empirical mode neural network-based chaotic time series prediction method
US20170039659A1 (en) * 2014-04-11 2017-02-09 Wuhan University Daily electricity generation plan making method of cascade hydraulic power plant group
CN106845371A (en) * 2016-12-31 2017-06-13 中国科学技术大学 A kind of city road network automotive emission remote sensing monitoring system
CN106971237A (en) * 2017-02-27 2017-07-21 中国水利水电科学研究院 A kind of Medium-and Long-Term Runoff Forecasting method for optimized algorithm of being looked for food based on bacterium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080108881A1 (en) * 2004-07-10 2008-05-08 Steven Elliot Stupp Apparatus for aggregating individuals based on association variables
CN102122370A (en) * 2011-03-07 2011-07-13 北京师范大学 Method for predicting river basin climatic change and analyzing tendency
US20170039659A1 (en) * 2014-04-11 2017-02-09 Wuhan University Daily electricity generation plan making method of cascade hydraulic power plant group
CN104008164A (en) * 2014-05-29 2014-08-27 华东师范大学 Generalized regression neural network based short-term diarrhea multi-step prediction method
CN104091074A (en) * 2014-07-12 2014-10-08 西安浐灞生态区管理委员会 Medium and long term hydrologic forecasting method based on empirical mode decomposition
CN104463358A (en) * 2014-11-28 2015-03-25 大连理工大学 Small hydropower station power generation capacity predicating method combining coupling partial mutual information and CFS ensemble forecast
CN104951847A (en) * 2014-12-31 2015-09-30 广西师范学院 Rainfall forecast method based on kernel principal component analysis and gene expression programming
CN104869126A (en) * 2015-06-19 2015-08-26 中国人民解放军61599部队计算所 Network intrusion anomaly detection method
CN105139093A (en) * 2015-09-07 2015-12-09 河海大学 Method for forecasting flood based on Boosting algorithm and support vector machine
CN105354416A (en) * 2015-10-26 2016-02-24 南京南瑞集团公司 Representative power station based basin rainfall runoff power macro-forecasting method
CN105678422A (en) * 2016-01-11 2016-06-15 广东工业大学 Empirical mode neural network-based chaotic time series prediction method
CN106845371A (en) * 2016-12-31 2017-06-13 中国科学技术大学 A kind of city road network automotive emission remote sensing monitoring system
CN106971237A (en) * 2017-02-27 2017-07-21 中国水利水电科学研究院 A kind of Medium-and Long-Term Runoff Forecasting method for optimized algorithm of being looked for food based on bacterium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIONG S Y等: "Flood stage forecasting with SVM", 《JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION》 *
李薇等: "基于主成分分析的三种中长期预报模型在柘溪水库的应用", 《水力发电》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537679B (en) * 2018-02-08 2022-04-12 中国农业大学 Remote sensing and crop model fused region scale crop emergence date estimation method
CN108537679A (en) * 2018-02-08 2018-09-14 中国农业大学 The regional scale crop emergence date evaluation method that remote sensing is merged with crop modeling
CN109492825A (en) * 2018-11-26 2019-03-19 中国水利水电科学研究院 Medium-long Term Prediction method based on mutual information and the principal component analysis screening factor
CN109671507A (en) * 2018-12-24 2019-04-23 万达信息股份有限公司 A kind of obstetrics' disease that calls for specialized treatment coupling index method for digging based on Electronic Health Record
CN110334546A (en) * 2019-07-08 2019-10-15 辽宁工业大学 Difference privacy high dimensional data based on principal component analysis optimization issues guard method
CN110852497A (en) * 2019-10-30 2020-02-28 南京智慧航空研究院有限公司 Scene variable slide-out time prediction system based on big data deep learning
CN112766531B (en) * 2019-11-06 2023-10-31 中国科学院国家空间科学中心 Runoff prediction system and method based on satellite microwave observation data
CN112766531A (en) * 2019-11-06 2021-05-07 中国科学院国家空间科学中心 Runoff prediction system and method based on satellite microwave observation data
CN111310968A (en) * 2019-12-20 2020-06-19 西安电子科技大学 LSTM neural network circulation hydrological forecasting method based on mutual information
CN111310968B (en) * 2019-12-20 2024-02-09 西安电子科技大学 LSTM neural network circulating hydrologic forecasting method based on mutual information
CN111445085A (en) * 2020-04-13 2020-07-24 中国水利水电科学研究院 Medium-and-long-term runoff forecasting method considering influence of medium-and-large-sized reservoir engineering water storage
CN117114523A (en) * 2023-10-23 2023-11-24 长江三峡集团实业发展(北京)有限公司 Runoff forecasting model construction and runoff forecasting method based on condition mutual information
CN117132176A (en) * 2023-10-23 2023-11-28 长江三峡集团实业发展(北京)有限公司 Runoff forecasting model construction and runoff forecasting method based on forecasting factor screening
CN117132176B (en) * 2023-10-23 2024-01-26 长江三峡集团实业发展(北京)有限公司 Runoff forecasting model construction and runoff forecasting method based on forecasting factor screening
CN117114523B (en) * 2023-10-23 2024-02-02 长江三峡集团实业发展(北京)有限公司 Runoff forecasting model construction and runoff forecasting method based on condition mutual information

Also Published As

Publication number Publication date
CN107463993B (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN107463993A (en) Medium-and Long-Term Runoff Forecasting method based on mutual information core principle component analysis Elman networks
CN106650767B (en) Flood forecasting method based on cluster analysis and real-time correction
CN112966954B (en) Flood control scheduling scheme optimization method based on time convolution network
CN108022001A (en) Short term probability density Forecasting Methodology based on PCA and quantile estimate forest
CN106886846A (en) A kind of bank outlets&#39; excess reserve Forecasting Methodology that Recognition with Recurrent Neural Network is remembered based on shot and long term
Dikbas et al. Classification of precipitation series using fuzzy cluster method
Li et al. A new flood forecasting model based on SVM and boosting learning algorithms
CN110414788A (en) A kind of power quality prediction technique based on similar day and improvement LSTM
CN103177301A (en) Typhoon disaster risk estimate method
CN109583565A (en) Forecasting Flood method based on the long memory network in short-term of attention model
CN109143408B (en) Dynamic region combined short-time rainfall forecasting method based on MLP
CN109492748B (en) Method for establishing medium-and-long-term load prediction model of power system based on convolutional neural network
CN104807589B (en) A kind of ONLINE RECOGNITION method collecting flow pattern of gas-liquid two-phase flow in defeated-riser systems
CN113344288B (en) Cascade hydropower station group water level prediction method and device and computer readable storage medium
CN107798431A (en) A kind of Medium-and Long-Term Runoff Forecasting method based on Modified Elman Neural Network
Danandeh Mehr Drought classification using gradient boosting decision tree
CN112232561A (en) Power load probability prediction method based on constrained parallel LSTM quantile regression
Zhang et al. Surface and high-altitude combined rainfall forecasting using convolutional neural network
CN116050595A (en) Attention mechanism and decomposition mechanism coupled runoff amount prediction method
CN106405683B (en) Wind speed forecasting method and device based on G-L mixed noise characteristic core ridge regression technology
CN107368933A (en) A kind of photovoltaic power Forecasting Methodology being fitted based on fit and coefficient correlation
CN109408896B (en) Multi-element intelligent real-time monitoring method for anaerobic sewage treatment gas production
CN111914488B (en) Data area hydrologic parameter calibration method based on antagonistic neural network
Khadr et al. Data-driven stochastic modeling for multi-purpose reservoir simulation
Basin Adaptive neuro fuzzy inference system for monthly groundwater level prediction in Amaravathi river minor basin

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant