CN107463993A

CN107463993A - Medium-and Long-Term Runoff Forecasting method based on mutual information core principle component analysis Elman networks

Info

Publication number: CN107463993A
Application number: CN201710662894.4A
Authority: CN
Inventors: 不公告发明人
Original assignee: He Zhiyao
Current assignee: He Zhiyao
Priority date: 2017-08-04
Filing date: 2017-08-04
Publication date: 2017-12-12
Anticipated expiration: 2037-08-04
Also published as: CN107463993B

Abstract

The invention discloses a kind of medium-term and long-term runoff DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM method based on mutual information, core principle component analysis and Elman networks, comprise the following steps：Meteorological model data is collected, establishes index time series and flow-through period sequence one-to-one relationship；Notable and high standard mutual information index is selected using the method for standard mutual information；With the principal component of the achievement data of core principle component analysis method extraction screening；Build Elamn neural network models；After z score standardization principal components, therefrom mark off training sample and network is exercised supervision training, mark off test samples and network is tested, calculate each evaluation index value；It is multiple to repeat single forecast, takes the ensemble average repeatedly forecast to make final predicted value.The present invention can fully excavate linear, non-linear relation between meteorological model data and runoff, and establish numerical relationship model, realize the forecast that centering long-period runoff amount is more accurate, stable.

Description

Medium-and Long-Term Runoff Forecasting based on mutual information-core principle component analysis-Elman networks Method

Technical field

It is more particularly to a kind of to be based on mutual information-core principle component analysis-Elman networks the invention belongs to areas of information technology Medium-and Long-Term Runoff Forecasting method.

Background technology

Accurately medium-term and long-term run-off forecast refers to the weight for leading water resources integrative planning, scientific management and Optimized Operation Want premise.

At present, the method for conventional Medium-and Long-Term Runoff Forecasting is to be based on statistical method, i.e., by finding Forecasting Object With the statistical relationship of predictor, forecast is realized.Statistical method is used for the key issue of Medium-and Long-Term Runoff Forecasting to be included Three aspect below：

(1) primary election of predictor：Primary election for predictor, currently used method are Linear correlative analysis methods (such as Pearson correlation analyses, Spearman correlation analyses), i.e., by calculate meteorological model data (including distant correlation factor, Locally associated factor etc.) coefficient correlation between history Inflow Sequence, the high factor of coefficient correlation is selected as predictor；

(2) noise reduction and de-redundancy for the factor come are selected：Noise reduction and de-redundancy for selecting the factor come, at present Conventional method is PCA (Principal Component Analysis, PCA).Due to correlation analysis side When method screens the factor, the factor of the high correlation filtered out is often more, and exists between different factor time serieses higher Multi-collinearity, there is also certain noise in itself for factor time series.Therefore, it is necessary to being dropped to selecting the factor come Make an uproar and de-redundancy.PCA can be with several less overall targets come instead of original more index, and these less overall targets It can not only reflect more originally compared with the useful information of multi objective to the greatest extent, and it is orthogonal between each other；

(3) foundation of the optimal mathematical relationship between Forecasting Object and predictor：For Forecasting Object and predictor Between optimal mathematical relationship foundation, currently used model has multiple regression, random forest, artificial neural network, support Vector machine etc..

The problem of following three aspect be present based on statistical Medium-and Long-Term Runoff Forecasting method in existing：

(1) hydrologic process is complicated, necessarily non-thread between predictor and Forecasting Object in addition to linear relationship also be present Sexual intercourse.Linear correlative analysis method for predictor primary election can only describe the linear relationship between variable, it is impossible to reflect variable Between non-linear relation；

(2) it is used for the PCA of primary election factor noise reduction and de-redundancy, is substantially a kind of Linear Mapping method, obtained principal component It is to be generated by Linear Mapping.This method have ignored the correlation for being higher than 2 ranks between data, so the principal component extracted is simultaneously It is not optimal；

(3) it is used for the model for establishing optimal mathematical relationship between Forecasting Object and predictor, conventional multiple regression is real It is also a kind of linear fit on border, it is impossible to the non-linear relation reflected between Forecasting Object and predictor.With other model phases Than, artificial neural network because robustness is good, Nonlinear Mapping and self-learning capability are strong, obtained in Medium-and Long-Term Runoff Forecasting compared with To be widely applied, but the uncertainty of neural network model parameter can affect to accuracy of the forecast, every time There can be the difference of certain amplitude between the result of forecast.

The content of the invention

The purpose of the present invention is to be directed to problem present in traditional statistical method, there is provided one kind can overcome these The method of the Medium-and Long-Term Runoff Forecasting of problem, so as to improve the stability of forecast and precision.

It is provided by the invention based on standard mutual information (Normalized Mutual Information, NMI), core it is main into Analysis (Kernel Principal Component Analysis, KPCA) and Elman neutral nets (Elman Neural Network Medium-and Long-Term Runoff Forecasting method), specifically includes following steps：

Step 1：Data prediction

1.1 collect regional history footpath flow datas to be predicted and have can be as the meteorological model data of predictor, often Meteorological model data includes index, the National Climate centers such as Atmospheric Circulation Characteristics, high-altitude field of pressure and sea surface temperature and carried The 74 Circulation Features indexes or new 130 atmospheric monitoring indexes supplied essentially comprising these conventional indexs, can be direct Preliminary predictor is selected from these index numbers.

1.2 have hysteresis quality in view of influence of the meteorological factor to runoff, and the index time series before foundation in 1 year is with treating Predict the one-to-one relationship of regional flow-through period sequence.For example, having selected 130 indexs, Forecasting Object is the year footpath of 2017 Stream, the history footpath flow datas and history achievement data of existing 1960-2016 month by month.It is corresponding with runoff with one of index Relation illustrates, and other indexs are identical with runoff corresponding relation.Corresponding relation is as follows：

Certain the index time series of table 1 and annual flow time series corresponding relation

Step 2：Predictor primary election based on standard mutual information

Index time series and annual flow time series are divided into two parts, training of the part as neural networks by 2.1 Sample, test samples of the another part as trained neutral net.For example, the data of former 50 years are as training sample This, the data of latter 5 years are as test samples.

2.2 calculate mutual information.To training sample data, when calculating each index time series respectively with corresponding runoff Between sequence mutual information.With the data instance in table 1, i.e., the mutual information that the row of computation sheet the 1st respectively arrange with form residue.Mutual information MI calculation formula is as follows：

Wherein, X is flow-through period sequence, X=(x₁,x₂,x₃...x_n)^T, Y be index time series, Y=(y₁,y₂, y₃...y_n)^T, molecule p (x_i,y_j) be X and Y joint distribution principle, p (x_i)、p(y_j) be respectively X and Y edge distribution rule.

2.3 calculate standard mutual information.Normalised mutual information, i.e., do denominator with entropy and the MI values of step 2.2 are mapped to 0 and 1 Between.Standard mutual information NMI calculation formula is as follows：

Wherein, H (X) and H (Y) is respectively X and Y entropy, and H (X) and H (Y) calculation formula are as follows：

The significance test (Significance Test) of 2.4 standard mutual informations.Standard mutual information is carried out using boot strap Inspection, comprise the following steps：

2.4.1 the standard mutual information NMI values of former flow-through period sequence and index time series are calculated；

2.4.2 random order K times (typically taking 100 times) for upsetting two corresponding time serieses simultaneously, calculates out of order rear NMI values And arranged by descending order；

2.4.3 take order arrangement NMI probability quantile as to should probability significance NMI threshold values；

2.4.4 if former time series NMI values are more than NMI values corresponding to certain probability threshold value (typically taking 95%), then it is assumed that this Two groups of data are significantly correlated.

2.5 select and are more than a certain threshold value by significance test and standard mutual information and (typically take 0.9, but according to time sequence The difference of row length can be variant, can voluntarily adjust) predictor of the index as primary election.

Step 3：Core principle component analysis is carried out, extracts principal component

3.1 standardize the predictor data z-score of primary election, and calculation formula is as follows：

In formula, y^*Data after being standardized for z-score, y are one in the predictor data of primary election, and μ is y institutes The average of the time series at place, σ are the standard deviation of the time series residing for y.

The nuclear matrix K of the predictor of primary election in 3.2 calculation procedures 2.5.K is n × n matrix, the member that the i-th row jth arranges Plain K_i,jCalculation formula it is as follows：

In formula,It is column vector, represents the time sequence after the predictor z-score standardization of different primary election Row, k is kernel function, and conventional kernel function has following several：

1. linear kernel (Linear Kernel)：

2. polynomial kernel (Polynomial Kernel)：

3. Radial basis kernel function (Radial Basis Function)：

4. Sigmoid cores (Sigmoid Kernel):

Formula (8), (9) and b, c, p, δ, υ, ξ in (10) are constant, are the parameters of various kernel functions.

3.3 calculate the nuclear matrix of centralization.Nuclear matrix K after centralization_cRepresent, K_cFor n × n matrix, K_cMeter It is as follows to calculate formula：

K_c=K-JK-KJ+JKJ (11)

J is n × n matrix in formula (11), and J form is as follows：

3.4 calculate the nuclear matrix K after centralization_cEigen vector, and characteristic value according to descending Order arranges, and the order of characteristic vector does corresponding adjustment according to characteristic value.The eigenvalue matrix obtained after sequence is Λ, feature Vector matrix is U, is represented as follows：

3.5 calculate normalized eigenvectors matrix A, and A form is as follows：

Wherein

3.6 extraction principal components, principal component matrix are n × n square formation.Before general extraction 2 to 3 principal components as forecast because Son.The calculation formula of i-th of principal component is as follows:

KPC in formula_i=(kpc_i1,kpc_i2,...,kpc_in), K_CThe nuclear matrix for the centralization being calculated for step 3.2.

Step 4：Build Elman neural network models

4.1 structure Elman network models, need to determine network structure (i.e. the nodes of each layer of network) first.Elman networks Structure chart is shown in Figure of description 2.The method for determining each node layer of network is as follows：The node number of input layer (Input Layer) Equal to the number of predictive factor；Output layer (Output Layer) nodes are equal to the number of Forecasting Object；Accept layer (Context Layer) nodes are equal to hidden layer (Hidden Layer) nodes；Node in hidden layer is for the general of network Change performance to have a major impact, but there is presently no the method for a system and standard to determine node in hidden layer.One ratio Preferably selection is exactly trial-and-error method, i.e., by using different node in hidden layer, observes the value of forecasting of network, so that it is determined that The number of hidden layer node.

4.2 structure Elman network models, it is also necessary to determine the training algorithm of network.The present invention uses back-propagation algorithm With the weights with momentum term and the adaptive gradient descent algorithm more row network of learning rate.Right value update formula is as follows：

In formula, E is cost function (Cost Function), and the present invention uses mean square error function (Mean Squared Error,MSE).ω be Elman neutral nets weight matrix, Δ ω_kThe change of weights when being updated for kth time Amount, η are learning rate (Learning Rate), and α is momentum constant (Momentum Constant), 0≤α ＜ 1, α of the present invention =0.9.The more new formula of learning rate is as follows during for each iteration：

η (k)=η (k-1) (1+ccos θ) (16)

In formula, c is constant, and the present invention takes 0.2.θ is most speed descent directionWith last weights knots modification Δ ω_k-1Between angle.

Step 5：The single model forecast of run-off

5.1 according to the principal component factor sequence and regional history footpath to be predicted that described in step 2.1, step 3.5 is extracted Sequence normalization is flowed, is then divided into training sample and test samples, normalization formula is as follows：

Wherein, z be normalization after data, z_max=1, z_min=-1, z ∈ [- 1,1], q are original Inflow Sequence or master One in components series, q_minFor the minimum value in the sequence where q, q_maxFor the maximum in the sequence where q.

5.2 input using the factor data in training sample as network, the history footpath flow data conduct in training sample The output of network, the learning training for having supervision is carried out to network.

5.3 pairs of networks after training, by the use of the factor in test samples as the input of network, the prediction effect to network Fruit is tested.By the result renormalization of inspection, the footpath flow valuve predicted.

5.4 with average absolute percent error ((Mean Absolute Percentage Error, MAPE), relative error (Relative Error, RE), maximum relative error (Maximum Relative Error, MRE), qualification rate (Qualified Rate, QR) it is the evaluation index forecast, the calculation formula of each index is as follows：

In formula (18), (19) and (20)For the footpath flow valuve of probative term prediction, x_iFor corresponding actual footpath flow valuve, j is inspection The number of samples tested.

In formula (21), T_QualFor qualified forecast number, T_totalFor total forecast number.According to《Hydrological Information and Forecasting is advised Model》(GI3/T22482-2008) scheme of evaluation Medium-and Long-Term Runoff Forecasting precision in, maximum relative error of the present invention to forecast Forecast less than 20% is qualified forecast.

Step 6：The DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM of run-off

In step 4, the present invention is realized to decline the gradient in Elman network weights space with back-propagation algorithm and searched Rope, iteratively reduce the error between the actual value of history footpath flow data and the predicted value of network.But error surface may contain Multiple different local minimums, during the gradient descent search to Elman network weights space, office may be rested on Portion's minimum point, and it is not necessarily global minimizer.Therefore, even if the structure phase of each Elman networks after training Together, but the connection weight parameter of model is also different, and this causes to deposit between each single Elman network models prediction result In difference.In order to reduce the deviation of this prediction result caused by model parameter uncertainty, the present invention is repeatedly carried out The single model forecast of run-off, using the average value of multiple forecast result as final forecast result.

Compared with prior art, the advantage of the invention is that：

(1) the first choosing method of the predictor based on standard mutual information, the linear relationship that can not only reflect between variable, also Non-linear relation between energy response variable, the factor of selection is more representative, overcomes traditional based on Linear correlative analysis The shortcomings that screening the method for the factor；

(2) core principle component analysis method (KPCA) is PCA (PCA) nonlinear extensions, that is, passes through mapping function Original vector is mapped to high-dimensional feature space F by Φ, and PCA analyses are carried out on F.The data of linearly inseparable in luv space In high-dimensional feature space nearly all can linear separability, be now PCA in higher dimensional space, the principal component of extraction, which more all has, to be represented Property.Therefore the feature extracting method based on KPCA substantially increases the disposal ability of nonlinear data, with traditional based on PCA's Feature extracting method is compared, advantageously.In addition, it is mutually orthogonal between the principal component through KPAC extractions, and data have passed through drop Make an uproar de-redundancy, can be good at preventing the over-fitting of neutral net, improve the generalization ability of network；

(3) artificial neural network robustness is good, Nonlinear Mapping and self-learning capability are strong, can be good at excavating forecast because Inner link between son and Forecasting Object.The Elman neutral nets that the present invention selects, it is a kind of typical dynamic regression net Network, compared with conventional feedforward neural network (such as BP neural network), add undertaking layer more.Accept layer and be able to record one The information of secondary network iteration and as the input of current iteration, this causes Elman networks to be more suitable for the prediction of time series data. In addition, neutral net has the uncertain problem of parameter, in order to reduce the uncertainty of forecast, it is pre- to employ multi-model set The method of report, improve forecast precision；

(4) it is used for NMI, the KPCA for Principle component extraction of factor primary election in the present invention with being used for Runoff Forecast Elman neutral nets all have the disposal ability to linear nonlinear data in addition, and three kinds of Combination of Methods together, can overcome The limitation of conventional method, improve the stability and accuracy of forecast.

Brief description of the drawings

Fig. 1 is the overview flow chart of the present invention；

Fig. 2 is the structure chart of Elman networks.

Embodiment

With reference to the accompanying drawings and examples, the present invention is further elaborated.

Fig. 1 is the overall flow figure of the present invention.By taking the forecast of Jinping Hydroelectric Power Station reservoir annual mean runoff as an example, press According to flow chart, six steps can be divided into, step is as follows：

Step 1：Data prediction

1.1 collect regional history footpath flow datas to be predicted and have can be as the meteorological model data of predictor, often Meteorological model data includes the indexs such as Atmospheric Circulation Characteristics, high-altitude field of pressure and sea surface temperature.What the present embodiment used Data information include Jinping Hydroelectric Power Station reservoir range data of annual mean runoff year by year of 1960~2011 years and 1959~ 74 circulation characteristic data month by month of 2010.

1.2 due to being that annual mean runoff is forecast, therefore the factor can not select out of the same period time then, meanwhile, Hysteresis quality be present in view of influence of the meteorological factor to runoff, so, according to table 1, establish Jinping Hydroelectric Power Station year by year (1960 ~2011 years) one-to-one corresponding of annual mean runoff and the 74 atmospheric circulation indexes of the previous year (1959~2010 years) month by month closes System.The corresponding relation such as table 2 of wherein a certain item atmospheric circulation exponential time sequence and flow-through period sequence, other indexs are similar.

The corresponding relation of certain the atmospheric circulation exponential time sequence of table 2 and flow-through period sequence

Step 2：Predictor primary election based on standard mutual information

Index time series and annual flow time series are divided into two parts, training of the part as neural networks by 2.1 Sample, test samples of the another part as trained neutral net.The data of 47 years are used as training before the present embodiment Sample, the data of latter 5 years are as test samples.

2.2 calculate mutual information MI.To training sample data, calculate respectively the time series of each index month by month with it is corresponding Flow-through period sequence mutual information.For the present embodiment, i.e., according to the mean annual runoff sequence of the 1st row in formula (1) computational chart 2 Mutual information in row and table between the remaining index time series respectively arranged.It is worth noting that, for the reliability of test effect, Mutual information only is calculated using training sample data, so as to screen preliminary predictor.Test samples data should not add.

2.3 normalized mutual information NMI, i.e., the MI values that step 2.2 is calculated are mapped to 0 with (2), (3) and (4) Between 1.

The significance test (Significance Test) of 2.4 mutual informations.The present embodiment carries out mutual information using boot strap Inspection, comprise the following steps：

2.4.2 upset the order 100 times of two time serieses at random, calculate it is out of order after NMI values simultaneously by descending order row Row；

If 2.4.4 former time series NMI values are more than NMI values corresponding to certain probability threshold value (the present embodiment takes 95%), recognize It is significantly correlated for this two groups of data.

2.5 select the index work for being more than a certain threshold value (the present embodiment takes 0.9) by significance test and standard mutual information For the predictor of primary election.In the present embodiment, index of the standard mutual information more than 0.9 has 205, and the information of preceding 20 indexs is such as Under：

The predictor of 3 preceding 20 primary election of table

The factor of primary election	NMI	MI
			August sunspot	0.988375	5.426929
April sunspot	0.988375	5.426929
			July sunspot	0.988375	5.426929
October sunspot	0.988375	5.426929
			December sunspot	0.988375	5.426929
2 months sunspots	0.98444	5.384376
			September sunspot	0.98444	5.384376
November sunspot	0.98444	5.384376
			January sunspot	0.98444	5.384376
March sunspot	0.98444	5.384376
			May sunspot	0.98444	5.384376
August Northern Hemisphere pair high intensity index (5E-360)	0.980474	5.341823
			The Northern Hemisphere in March pole whirlpool area index (5th area, 0-360)	0.980474	5.341823
Atlantic Ocean North America, north African in June pair high intensity index (110W-60E)	0.976477	5.299270
			Northern Hemisphere pair high intensity index in June (5E-360)	0.976291	5.256717
Northern Hemisphere pair high intensity index in April (5E-360)	0.972448	5.256717
			Atlantic Ocean North America, north African in July pair high intensity index (110W-60E)	0.972448	5.256717
Atlantic Ocean North America, September north African pair high intensity index (110W-60E)	0.972448	5.256717
			June sunspot	0.972448	5.256717
Pacific Subtropical High intensity index in June (110E-115W)	0.970919	5.240655

Step 3：Core principle component analysis is carried out, selects principal component as predictor.This example have selected in step 2.5 205 factor sequences, multicollinearity often be present between these factor sequences.Predictor with multicollinearity can be made Weight matrix into neutral net increases, and the information and noise repeated can directly affect the training speed of neutral net and extensive Ability, it is therefore desirable to carry out feature extraction, noise reduction de-redundancy.This example is from Radial basis kernel function as core principle component analysis Kernel function, principal component is calculated according to formula (5), (6), (9), (11), (12), (13) and (14), obtained principal component is according to side The order that the value of poor contribution rate is descending arranges, the variance contribution ratio such as table 4 of preceding 5 principal components of extraction, corresponding first 5 The data such as table 5 of main stor(e)y point.

The variance contribution ratio of 4 preceding 5 principal components of table

Principal component	Principal component _ 1	Principal component _ 2	Principal component _ 3	Principal component _ 4	Principal component _ 5
						Variance contribution ratio	25.7%	6.9%	5.6%	5.1%	3.9%

Preceding 5 principal components of the KPCA of table 5 extractions

In the present embodiment, determine to select which principal component as predictor using trial-and-error method.Sent out by repetition test Existing, when from the first two principal component as predictor, the value of forecasting of probative term is best, final to determine that predictor is selected The first two principal component.It is worth noting that, in order to which standard used when training sample and test samples extract principal component is consistent , it is necessary to KPCA will be carried out together with training sample sequence and test samples combined sequence.In the present embodiment, training sample sequence Length be 47, the length of test samples sequence is 5, and the length of sequence samples and test samples combined sequence is 52, therefore, table The sequence length for the principal component extracted in 4 is 52.

Step 4：Build Elman neural network models

4.1 structure Elman network models, need to determine network structure (i.e. the nodes of each layer of network) first.Determine network The method of each node layer is as follows：

(1) node number of input layer (Input Layer) is equal to the number of predictive factor.The present embodiment has selected the first two Principal component is as predictor, and therefore, Elman neural network input layers nodes are 2；

(2) output layer nodes are equal to the number of Forecasting Object, and the present embodiment is pre- to annual mean runoff progress monodrome Report, therefore output layer node number is 1；

(3) accept node layer number and be equal to node in hidden layer；

(4) node in hidden layer has a major impact for the Generalization Capability of network, but there is presently no a system and The method of standard determines node in hidden layer.One relatively good selection is exactly trial-and-error method, i.e., is implied by using different Node layer number, the value of forecasting of network is observed, so that it is determined that the number of hidden layer node.In the present embodiment, because early stage is used KPCA has carried out noise reduction, de-redundancy to factor data, and orthogonal between obtained principal component, can effectively prevent nerve excessively The over-fitting of network, so, when node in hidden layer is respectively 3,4,5,6,7,8,9,10,11,12,13 and 15, probative term For the relative error of interior forecast all within 20%, network is very stable, has good generalization ability.By testing repeatedly, when hidden When number containing node layer is 10, the maximum relative error of probative term forecast falls below 15%, it is thus determined that node in hidden layer is 10.

4.2 build Elman network models, it is also necessary to determine the training algorithm of network.The present embodiment is calculated using backpropagation Method and the weights with momentum term and the adaptive gradient descent algorithm more row network of learning rate.Right value update formula see formula (15) and Formula (16).

Step 5：The single model forecast of run-off

5.1 according to the principal component factor sequence and regional history footpath to be predicted that described in step 2.1, step 3.5 is extracted Flow sequence to normalize according to formula (1), be then divided into training sample and test samples.In the present embodiment, two step 3 selected The data of 47 years are as training sample, the number of latter 5 years before individual chief composition series and Jinping Hydroelectric Power Station mean annual runoff sequence According to as test samples.

5.2 input using the factor data in training sample as network, the history footpath flow data conduct in training sample The output of network, the learning training for having supervision is carried out to network.Learning process can be summarized as follows：

(1) using the connection weight coefficient between random function initialization each layer of network, and cost function (Cost is determined Function) the error ε allowed.The present embodiment cost function using mean square error function (Mean Squared Error, MSE)；

(2) to network inputs learning sample, combination algorithm calculates the value E of mean square error function, and each according to E renewal networks Connection weight between layer；

(3) when E value is more than ε, step (2) is gone to, otherwise study terminates, calculating network output.

5.3 pairs of networks after training, by the use of the factor data in test samples as the input of network, to the pre- of network Effect is surveyed to test.By the result renormalization of inspection, the footpath flow valuve predicted.

5.4 ((Mean Absolute Percentage Error, MAPE), are missed greatly relatively with average absolute percent error Poor (Maximum Relative Error, MRE), qualification rate (Qualified Rate, QR) are the evaluation index of forecast, are respectively referred to Mark calculates according to formula (17), (18), (19) and (20).In order to verify the generalization ability of network model and forecast in the present invention Stability, the present embodiment have carried out 100 single model forecast, as a result found, the maximum relative error of forecast in each probative term All within 16%, qualification rate has reached 100%.Illustrate the network model used in the present invention have good generalization ability and Forecast stability.The error statistics such as table 6 of the forecast of wherein preceding 5 probative terms.

The single model probative term prediction error of table 6 counts

Step 6：The DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM of run-off

In order to reduce the deviation of the prediction result caused by model parameter uncertainty, the present invention repeatedly carries out runoff The single model forecast of amount, using the average value of multiple forecast result as final forecast result., can be by 100 in the present embodiment The average value of the result of secondary forecast is as final forecast result.

Embodiments of the invention is the foregoing is only, is not intended to limit the invention.All principles in the present invention Within, the equivalent substitution made should be included in the scope of the protection.The content category that the present invention is not elaborated In prior art known to this professional domain technical staff.

Claims

A kind of 1. Medium-and Long-Term Runoff Forecasting method based on mutual information-core principle component analysis-Elman networks, it is characterised in that should Method includes the predictor primary election based on mutual information；Principal component is extracted with core principle component analysis；Build Elman neutral nets Model；The single model forecast of run-off；The multi-model DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM of run-off.
2. the Medium-and Long-Term Runoff Forecasting method as claimed in claim 1 based on mutual information-core principle component analysis-Elman networks, Characterized in that, the predictor primary election based on mutual information comprises the steps of：

(1) mutual information MI of each index time series with corresponding flow-through period sequence is calculated：

<mrow> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>

In formula, X is flow-through period sequence, X=(x₁,x₂,x₃...x_n)^T, Y be index time series, Y=(y₁,y₂, y₃...y_n)^T, molecule p (x_i,y_j) be X and Y joint distribution principle, p (x_i)、p(y_j) be respectively X and Y edge distribution rule；

(2) do denominator with entropy MI values are mapped between 0 and 1, obtain standard mutual information NMI：

<mrow> <mi>N</mi> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>2</mn> <mfrac> <mrow> <mi>M</mi> <mi>I</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>H</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>H</mi> <mrow> <mo>(</mo> <mi>Y</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>

<mrow> <mi>H</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow>

Wherein, H (X) and H (Y) is respectively X and Y entropy, and H (Y) is similar to H (X) calculation formula；

(3) inspection of standard mutual information is carried out using boot strap.
3. the Medium-and Long-Term Runoff Forecasting method as claimed in claim 1 based on mutual information-core principle component analysis-Elman networks, Characterized in that, comprised the steps of with core principle component analysis extraction principal component：

(1) the predictor data z-score standardization of primary election；

(2) the nuclear matrix K of the predictor of primary election；

(3) the nuclear matrix K of centralization is calculated_c, and its eigen vector is calculated, characteristic value according to descending Order arranges, and the order of characteristic vector does corresponding adjustment according to characteristic value；

(4) normalized eigenvectors matrix A is calculated, and calculates the nuclear matrix K of centralization_cProjection in characteristic vector, is obtained Principal component.
4. the Medium-and Long-Term Runoff Forecasting method as claimed in claim 1 based on mutual information-core principle component analysis-Elman networks, Characterized in that, the single model forecast of run-off comprises the steps of：

(1) Elman network models are used, make single model forecast to run-off.
5. the Medium-and Long-Term Runoff Forecasting method as claimed in claim 1 based on mutual information-core principle component analysis-Elman networks, Characterized in that, the multi-model DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM of run-off comprises the steps of：

(1) Elman network models are used, multiple single model forecast is done to run-off；

(2) using the result average value repeatedly forecast as last forecast result.