CN107622322B - Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff - Google Patents

Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff Download PDF

Info

Publication number
CN107622322B
CN107622322B CN201710701183.3A CN201710701183A CN107622322B CN 107622322 B CN107622322 B CN 107622322B CN 201710701183 A CN201710701183 A CN 201710701183A CN 107622322 B CN107622322 B CN 107622322B
Authority
CN
China
Prior art keywords
factor
runoff
forecast
forecasting
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710701183.3A
Other languages
Chinese (zh)
Other versions
CN107622322A (en
Inventor
苗淼
魏加华
黄跃飞
李铁键
谢帅
田旭
白左霞
马雪
刘飞
彭飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
State Grid Qinghai Electric Power Co Ltd
Economic and Technological Research Institute of State Grid Qianghai Electric Power Co Ltd
Original Assignee
Tsinghua University
State Grid Qinghai Electric Power Co Ltd
Economic and Technological Research Institute of State Grid Qianghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, State Grid Qinghai Electric Power Co Ltd, Economic and Technological Research Institute of State Grid Qianghai Electric Power Co Ltd filed Critical Tsinghua University
Priority to CN201710701183.3A priority Critical patent/CN107622322B/en
Publication of CN107622322A publication Critical patent/CN107622322A/en
Application granted granted Critical
Publication of CN107622322B publication Critical patent/CN107622322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a forecasting factor identification method of medium-and-long-term runoff and a forecasting method of medium-and-long-term runoff. The identification method of the forecasting factor of the medium-long term runoff comprises the following steps: (1) carrying out standardization treatment; (2) setting a forecast period, and taking a series of alternative forecast factor sets X consisting of the normalized runoff sequences Q and the climate factor set sequences F in different lag periods as sets Y in the Lasso regression; (3) giving a parameter lambda, performing cross validation, calculating a forecast set Y ', and comparing the set Y' with the set Y to obtain a first evaluation index of the parameter lambda; (4) selecting M different parameters lambda, carrying out normalization processing on the first evaluation index of the parameters lambda and adding the results to serve as scores; (5) counting the total score of each parameter lambda, and selecting the parameter lambda with the highest total score as an optimal parameter; (6) and (4) according to the regression coefficient of each climate factor obtained in the step (3) of the optimal parameter, identifying the climate factor corresponding to the nonzero regression coefficient as a forecast factor.

Description

Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff
Technical Field
The invention relates to the technical field of medium-and-long-term hydrological forecasting, in particular to a forecasting factor identification method and a medium-and-long-term runoff forecasting method.
Background
The medium-long term hydrologic forecast refers to hydrologic forecast with a forecast period of 3 days to 1 year and the forecast period exceeding the drainage basin convergence time, and forecast variables comprise water level, runoff, rainfall and the like. As runoff and reservoir dispatching are directly related, medium-and long-term runoff forecasting is the most common and the most concerned, and plays an important role in reservoir dispatching decision-making fields such as flood control and drought resistance, reservoir power generation planning, water resource comprehensive management and the like. And, in general, the longer the forecast period of the medium-and-long-term runoff forecast, the more the reservoir scheduling decision can be supported.
In the current research, due to the lack of reliable weather forecast in corresponding forecast periods, a model based on data analysis, a moving average autoregressive (ARMA) model, an Artificial Neural Network (ANN), a Support Vector Regression (SVR) and other data-driven models are adopted in hydrologic forecast and are sequentially applied to medium-and long-term hydrologic forecast, and a good result is obtained. Due to the nonlinearity of the runoff time sequence, the forecasting effect of nonlinear models such as ANN and SVR is better than that of ARMA models. In addition to these single models, some methods combine different models to predict the time series of radial flows and achieve better results than the single model.
However, most of the existing studies are conducted on the set of models and output results, and few consider changing the input data of the models. The input data of most current researches are prophase runoff, and a better result can be obtained when the forecast period is shorter (less than 1 month), because the prophase runoff and the current runoff have a strong correlation when the forecast period is shorter, and the correlation rapidly weakens when the forecast period is longer, so that the forecast reliability is rapidly reduced, which is also a main reason that the forecast period of the current medium-long term runoff forecast is mostly 1 month.
In fact, reliable medium-and-long-term hydrological forecasting with longer forecast period is more useful for reservoir scheduling decision, and external factors such as atmospheric circulation factor, sea temperature and the like are also applied to medium-and-long-term runoff forecasting to improve forecasting precision and forecast period. However, the climate factors are numerous, and considering that the influence of the climate factors on the runoff has hysteresis, the climate factors can be used as more time series of alternative forecasting factors, if the climate factors are not selected, beneficial forecasting factors cannot be obtained, most of current forecasting methods determine partial forecasting factors through priori knowledge, and no stable method is used for selecting proper medium-long term runoff forecasting factors from the numerous factors to serve as the input of a model.
Therefore, the identification method of the forecasting factors for medium-long term runoff forecasting needs to be modified and determined at the present stage.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
The present invention has been completed based on the following findings of the inventors:
the inventor finds that in the current medium-long term runoff forecast, most model inputs are early-stage runoff or climate factors selected based on experience. The runoff in the previous period is used as a forecasting factor to limit the forecasting period of the runoff forecasting, and when climate factors with relatively lagged influence on the runoff are adopted, remote correlation is considered, a plurality of climate factors can influence the runoff, and the lag period of the influence of the climate factors on the runoff in a certain area cannot be determined, so that a plurality of climate factors in different lag periods are used as alternative forecasting factors, corresponding research is not carried out in the field of medium-long term hydrological forecasting at present, and the problem is solved.
The inventor of the present invention has found through intensive research that, in order to achieve the above purpose, the present invention introduces a new method in the field of medium and long term runoff forecasting, namely an at Least Absolute Shrinkage and Selection Operator (Lasso) regression method, to select an appropriate forecasting factor from climate factors. The lasso regression was proposed by Tibshirani in 1996 on the basis of both ridge regression and subset selection methods, combining the two advantages of interpretability of subset selection and stability of ridge regression. Since the introduction of lasso, it has found wide application in many areas, but not in hydrologic forecasting, where it is applied to the identification of runoff forecasting factors.
In view of the above, an object of the present invention is to provide an identification method for effectively screening out forecasting factors and forecasting factors of medium and long term runoff with different lag periods according to a Lasso regression method.
In a first aspect of the invention, the invention provides a method for identifying a forecasting factor of medium-long term runoff.
According to an embodiment of the present invention, the identification method includes: (1) carrying out standardization treatment on the runoff sequence and the climate factor sequence so as to obtain a standardized runoff sequence Q and a standardized climate factor sequence F; (2) setting and according to a forecast period, setting an alternative forecast factor set X consisting of a series of the standardized runoff sequences Q and a climate factor set F in different lag periods, and taking the corresponding standardized runoff sequence Q as a set Y in a Lasso regression; (3) giving a parameter lambda, performing cross validation and calculating a forecast set Y ', and comparing the forecast set Y' with the set Y so as to obtain a first evaluation index of the parameter lambda; (4) selecting M different parameters lambda, carrying out normalization processing on the first evaluation indexes, and adding the results of the normalization processing to obtain scores; (5) counting the sum of the scores of each parameter lambda to serve as the total score of the parameters lambda, and selecting the parameter lambda with the highest total score as an optimal parameter; (6) the regression coefficient of each climate factor obtained in the step (3) according to the optimal parameter, wherein the climate factor corresponding to the non-zero regression coefficient is identified as the forecast factor.
The inventor unexpectedly finds that by adopting the identification method provided by the embodiment of the invention, the proper medium-and-long-term (the forecast period is longer than 5 months) runoff forecast factors can be effectively screened out from a plurality of different climate factors in the lag period according to the Lasso regression method, and different lag periods of different forecast factors can be determined.
In addition, the identification method according to the above embodiment of the present invention may further have the following additional technical features:
according to an embodiment of the invention, in steps (1) and (2), the runoff sequence is a sequence composed of current runoff data of each month, and the climate factor sequence is a matrix composed of 74 pieces of climate factor data of each month; the forecast period is set to be any one of 1-12 months, and the lag period is 1-24 months.
According to an embodiment of the present invention, in step (2), the set of alternative prediction factors X and the set Y are in the form of:
Figure BDA0001380393680000031
wherein LT is a forecast period, t is a relative month number, Q is a normalized runoff sequence, and F is a normalized climate factor sequence.
According to an embodiment of the present invention, in step (3), the step of cross-validating and calculating the forecast set Y' includes: (3-1) randomly disordering and dividing the candidate forecasting factor set X and the candidate forecasting factor set Y into a plurality of pieces of data according to the same sequence; (3-2) selecting one section from the multiple sections of data as data to be forecasted, using other data of the multiple sections of data as Lasso training data, and calculating regression coefficients of the climate factor and the runoff factor according to the parameter lambda; and (3-3) calculating the regression coefficients according to the X scores of all the segments in other data of the plurality of segments of data in sequence, and combining the forecast set Y'.
According to an embodiment of the present invention, the plurality of pieces of data is 26 pieces of data.
According to an embodiment of the present invention, the step (3) further includes: repeating the steps of cross validation 100 times, calculating a forecast set Y 'and comparing the forecast set Y' with the set Y, and taking 100 mean square errors as a first evaluation index of the parameter lambda.
According to an embodiment of the present invention, in the step (4), the step of normalizing the first evaluation index includes: and taking the mean value and the standard deviation of the 100 mean square errors as second evaluation indexes, and then carrying out normalization processing on the second evaluation indexes of the M different parameters lambda.
According to the embodiment of the present invention, in the step (5), the optimal parameter is 0.14.
According to the embodiment of the present invention, in step (6), the step of identifying the climate factor corresponding to the non-zero regression coefficient as the forecasting factor comprises: counting the non-zero regression coefficients in the calculation of the 100 times of cross validation, obtaining the frequency of the non-zero regression coefficient of each climate factor, and taking the climate factor with the frequency greater than 0.95 as the forecasting factor.
In a second aspect of the invention, the invention provides a method for predicting medium and long term runoff.
According to an embodiment of the invention, the data input by the prediction method comprises the forecasting factors identified by the method.
The inventor unexpectedly finds that by adopting the prediction method provided by the embodiment of the invention, the data comprises the forecast factors related to the runoff telemetry of the medium-long term, so that the prediction precision of the medium-long term runoff obtained by the prediction method is higher and the forecast period is longer. It can be understood by those skilled in the art that the features and advantages described above for the method for identifying a forecasting factor for medium-long term runoff are still applicable to the method for forecasting medium-long term runoff, and are not described herein again.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flow chart of a method for identifying a forecasting factor of medium-and long-term runoff according to an embodiment of the invention;
fig. 2 is a schematic flow chart of a method for identifying a forecasting factor of medium-and long-term runoff according to another embodiment of the invention;
FIG. 3 is a box-shaped statistical plot of MSE for different parameters λ according to one embodiment of the present invention;
FIG. 4 is a graph comparing the effect of having 74 climate factors and not having 74 climate factors according to an embodiment of the present invention.
Detailed Description
The following examples of the present invention are described in detail, and it will be understood by those skilled in the art that the following examples are intended to illustrate the present invention, but should not be construed as limiting the present invention. Unless otherwise indicated, specific techniques or conditions are not explicitly described in the following examples, and those skilled in the art may follow techniques or conditions commonly employed in the art or in accordance with the product specifications. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products available on the market.
In one aspect of the invention, the invention provides a method for identifying a forecasting factor of medium-long term runoff. The recognition method of the present invention will be described in detail with reference to fig. 1 to 4.
According to an embodiment of the present invention, referring to fig. 1, the identification method includes:
s100: and (4) carrying out standardization treatment on the runoff sequence and the climate factor sequence.
In this step, the runoff sequence and the climate factor sequence are normalized to obtain a normalized runoff sequence Q and a normalized climate factor sequence F. Therefore, the condition that each climate factor influences and identifies the forecasting factor due to numerical difference in subsequent Lasso regression analysis can be guaranteed.
According to an embodiment of the present invention, the runoff sequence may be a sequence of current runoff data of each month, and the climate factor sequence may be a matrix of 74 climate factor data of each month. The runoff sequence and the climate factor sequence are selected in this way, so that subsequent Lasso regression analysis can be conveniently carried out, the forecast period of medium-long term runoff forecast can be prolonged, and the runoff sequence Q and the climate factor sequence F after standardization processing can be obtained after the runoff sequence and the climate factor sequence are respectively standardized. In some specific examples of the present invention, the inventor of the present application selects warehousing runoff data of lungyin isthmus reservoir from 1 month to 12 months 2012 1987 as a runoff sequence, and performs standardization processing in months, and also performs standardization processing using 74 pieces of climate factor data issued by national climate center in the same period as the climate factor sequence.
S200: setting and forming an alternative forecasting factor set X by a series of standardized runoff sequences Q and a climate factor set F in different lag phases according to a forecast period, and taking the corresponding standardized runoff sequence Q as a set Y in the Lasso regression.
In this step, a forecast period is set, and a candidate forecast factor set X is formed according to the forecast period, wherein the candidate forecast factor set X includes a series of normalized runoff sequences Q at different lag periods and a climate factor set F, and the normalized runoff sequence Q corresponding to the candidate forecast factor set X is also used as a set Y in the Lasso regression.
It should be noted that the specific formula of the Lasso regression method is
Figure BDA0001380393680000051
It is similar to ridge regression, adding penalty term to regression coefficient on the basis of least square method, but lasso regression penalizes the absolute value of regression coefficient. The inventors of the present application have long studied and found that lasso regression can effectively model independent variablesThe regression coefficient shrinks and the regression coefficient of a part of independent variables can be shrunk to 0, and the result has stability, so that the identification of the predictor can be performed by the Lasso regression method.
According to the embodiment of the present invention, the specific number of months in the forecast period and the lag period is not particularly limited, and those skilled in the art can adjust the specific number of months in the forecast period according to the accuracy of the forecast result of the medium-and-long-term runoff forecast, and adjust the specific number of months in the lag period according to the regression coefficient of each climate factor of the subsequent Lasso regression. In some specific examples of the present invention, the foreseeable period may be set to any one of 1-12 months, and the delayed period may be selected from 1-24 months. Therefore, the conditions that the forecast period of the medium-and-long-term runoff is longer than 1 month and the remote correlation of each climate factor and the current runoff is met simultaneously, and the appropriate medium-and-long-term runoff forecasting factor can be effectively screened out from various climate factors in the lag period.
According to an embodiment of the present invention, the specific form of the alternative set of predictor factors X and the set Y may be as follows:
Figure BDA0001380393680000052
wherein LT is a forecast period, t is a relative month number, Q is a normalized runoff sequence, and F is a normalized climate factor sequence. Thus, the alternative forecasting factor set X and the alternative forecasting factor set Y in the expression form can more fully take the remote relevance of each climate factor and the current runoff into consideration. It should be noted that the relative month number t specifically refers to a difference value between the month and the initial month in the runoff sequence, and is sorted by time into 1, 2, and 3 … …, for example, taking some specific examples of the present application, if month 1 in 1987 is the initial month in the runoff sequence, then the relative month number t in month 2 in 1989 is 25, and so on.
S300: and giving a parameter lambda, performing cross validation and calculating a forecast set Y ', and comparing the forecast set Y' with the set Y so as to obtain a first evaluation index of the parameter lambda.
In this step, a parameter λ is given, a prediction set Y 'can be calculated by a cross-validation method, and the prediction set Y' is compared with the set Y to obtain a first evaluation index of the parameter λ. In some specific examples of the present invention, the parameter λ may be set to 0.1.
According to an embodiment of the present invention, referring to fig. 2, the step of cross-validating and calculating the forecast set Y' in step S300 (not labeled in fig. 2) may further include:
s310: and randomly disordering the candidate forecasting factor set X and the candidate forecasting factor set Y according to the same sequence and dividing the candidate forecasting factor set X and the candidate forecasting factor set Y into a plurality of pieces of data.
In the step, the alternative forecasting factor set X and the alternative forecasting factor set Y are randomly disturbed according to the same sequence and are divided into a plurality of segments of data, and the data originally arranged according to the time sequence are disturbed and then rearranged, so that the performance of factors under different data division conditions can be evaluated, and the forecasting stability is evaluated.
According to the embodiment of the present invention, the number of the specific segments of the multi-segment data is not particularly limited as long as the training data for the subsequent Lasso regression calculation is sufficient, and those skilled in the art can reduce the setting according to the data amount. In some specific examples of the present invention, the plurality of pieces of data may be specifically 26 pieces of data, and thus, the inventors of the present application may obtain a model of different parameters λ with higher stability in the following.
S320: and (3) selecting one section from the multiple sections of data as data to be forecasted, using other data of the multiple sections of data as Lasso training data, and calculating regression coefficients of the climate factor and the runoff factor according to the parameter lambda.
In this step, taking some specific examples of the present invention as examples, one optional segment of 26 segments of data is used as the data to be forecasted, and the other 25 segments of data are used as the training data of Lasso, and then the regression coefficients corresponding to each climate factor and runoff factor can be calculated according to the parameter λ.
S330: calculating according to the X scores of all the segments in other data of the multiple segments of data and the regression coefficient in sequence, and combining a forecast set
In this step, taking some specific examples of the present invention as examples, the other 25 segments of data are respectively calculated according to the regression coefficients corresponding to the climate factors and the runoff factors obtained in step S320, and the prediction set Y' is combined with Y of the data portion to be predicted according to the time sequence. Thus, the forecast set Y' obtained by cross validation can be compared with the set Y to obtain a first evaluation index of the parameter λ.
According to the embodiment of the present invention, taking some specific examples of the present invention as examples, in order to improve the forecasting accuracy of steps S310 to S330, step S330 may further specifically include: repeating the steps of performing cross validation 100 times (S310), calculating a prediction set Y ' (S320), and comparing the prediction set Y ' with the set Y (S330), 100 sets of Y ' can be obtained, and then the mean square error (MSE | | Y ' -Y | |) of the 100 sets of Y ' compared with Y is calculated2I.e., the square of the two-norm difference) as the first evaluation index with the parameter λ being 0.1, the influence of randomly disturbing the data in the cross-validation step can be eliminated by repeating 100 times, thereby further improving the effectiveness of the identification method.
S400: m different parameters lambda are selected, normalization processing is carried out on the first evaluation index of the parameters lambda, and the results of the normalization processing are added to be used as scores.
In this step, referring to fig. 2, if the number of parameters λ used for regression coefficient calculation is insufficient, the value of the parameter λ may be changed and then step S300 may be repeated, so that M different parameters λ may be obtained, and further normalization processing may be performed on the first evaluation indexes obtained for the M different parameters λ, and the results after the normalization processing are added according to the parameter λ as the score of the parameter λ.
According to an embodiment of the present invention, in step S400, the step of normalizing the first evaluation index may specifically include: and taking the mean value and the standard deviation of 100 mean square errors as second evaluation indexes, and then carrying out normalization processing on the second evaluation indexes of M different parameters lambda.
In some examples of the present invention, the parameters λ are respectively 0.11, 0.12, … …, 0.2, 0.22, 0.24, … …, 0.3, and a total of 16 different parameters λ, and the obtained box statistical graph of 16 groups of MSEs is shown in fig. 3, as can be seen from fig. 2, when the parameter λ (Lambda) becomes small, but due to the easy occurrence of overfitting, the change amplitude of MSE becomes large, the model is more unstable, and in order to be able to consider the accuracy and stability of the model at the same time, the average value and standard deviation of 100 MSEs are used as evaluation indexes; and then taking the mean value and the standard deviation of 100 MSEs as evaluation indexes, and then carrying out normalization processing on the evaluation indexes of 16 parameters, wherein the second evaluation indexes with the minimum MSE mean value and the minimum standard deviation are all 1 point, and the second evaluation indexes with the maximum MSE mean value and the maximum standard deviation are all 0 point.
S500: and counting the total score of each parameter lambda as the total score of the parameter lambda, and selecting the parameter lambda with the highest total score as the optimal parameter.
In this step, taking some specific examples of the present invention as examples, the result of summing the second evaluation indexes of 16 parameters λ is used as the total score of the parameters λ, and thus, the parameter λ with the highest total score is the optimal parameter.
According to the embodiment of the present invention, the specific value of the optimal parameter is not particularly limited, and those skilled in the art can perform the screening according to the data source of different runoff sequences. In some embodiments of the present invention, the average MSE, the standard deviation of the MSE, and the score for each parameter λ (Lambda) are shown in table 1, and as can be seen from table 1, the most suitable value of the parameter λ is 0.14.
TABLE 1 Cross-validation Effect scoring for different parameters λ
Figure BDA0001380393680000071
S600: the regression coefficients of the climate factors obtained in step S300 according to the optimal parameters, wherein the climate factor corresponding to the non-zero regression coefficient is identified as the predictor.
In this step, according to the optimal parameter obtained in step S500, the regression coefficient of each climate factor of the optimal parameter in step S300 may be obtained, and the climate factor corresponding to the non-zero regression coefficient may be identified as a forecasting factor, which may be used for input data of medium-and-long term runoff forecasting.
According to an embodiment of the present invention, the step of identifying the climate factor corresponding to the non-zero regression coefficient as the predictor may specifically include: and (4) counting the non-zero regression coefficients in the calculation of 100 times of cross validation, obtaining the frequency of the non-zero regression coefficient of each climate factor, and taking the climate factor with the frequency greater than 0.95 as a forecasting factor.
In some specific examples of the present invention, the probability that the regression coefficient of each climate factor is a non-zero value may be obtained by performing statistics on the non-zero regression coefficient obtained in the 100 times of cross validation in step S300 according to the optimal parameter λ ═ 0.14, and then a factor with a frequency greater than 0.95 may be selected as the forecasting factor, and the statistical results corresponding to each forecasting period are shown in tables 2 and 3 (in the tables, the factor number 1 represents early stage runoff, and the factor numbers 2 to 75 represent 74 climate factors, and their specific meanings are listed in table 3). It can be seen from the table that when the forecast period is longer, more climate factors occupy more important proportion in the forecast, and the lag period is longer, so that the runoff forecast with longer forecast period can be provided according to the factors. Moreover, it can be seen that factors 1 (early runoff) and 69 (Qinghai-Tibet plateau index) are selected as factors having the smallest lag phase under the current forecast period, and the other factors have stable lag phases.
TABLE 2 forecasting factors for each forecast period (frequency greater than 0.95)
Figure BDA0001380393680000081
TABLE 3 actual names of the factors in TABLE 2
Factor number Actual name
1 Early stage runoff
17 Western Pacific vice high intensity index (110 DEG E-180 DEG)
20 Atlantic vice high intensity index (55 degree W-25 degree W)
24 Northern hemisphere auxiliary high crest line (5E-360)
38 Indian vice high north (65 degree E-95 degree E)
39 Western Pacific vice high north (110 degree E-150 degree E)
69 Tibet plateau index (30 degree N-40 degree N,75 degree E-105 degree E)
74 Sun black seed
75 Wave index in south
In some specific examples of the present invention, through steps S100 to S600, forecasting factors for Longyang isthmus warehousing runoff forecasting of different forecasting periods can be obtained, and then these factors can be used as inputs of various forecasting models. In order to illustrate that 74 climate factors (X1) with different lag periods are selected to be beneficial, the alternative forecasting factor set X in the above steps is changed to only contain early runoff and not 74 climate factors (X2), so that the comparison between the two in each forecasting period is shown in fig. 4. As can be seen from fig. 4, after the forecast period is longer than 5 months, the effect of using 74 climate factors is better than that of using only the forecast period runoff as the forecast factor, which illustrates that the best forecast factor combination can be selected from a plurality of forecast factors (24 × 75 — 1800 in this embodiment) by the Lasso regression method, and by introducing 74 climate factors, better results can be obtained in the case of longer forecast period. And moreover, a beneficial medium-and-long-term forecasting factor of the Longyang isthmus warehousing runoff is selected from the early runoff and the 74 climate factors. More broadly, various other factors such as sea temperature, Nino 3.4 index and the like can be used as alternative forecasting factors, and the forecasting factors are screened by the method and then input into a forecasting model to improve the model effect.
In summary, according to the embodiments of the present invention, the present invention provides an identification method, which can effectively screen out appropriate forecasting factors for runoff in medium and long periods (forecasting period longer than 5 months) from a plurality of climate factors in different lag periods according to a Lasso regression method, and can also determine different lag periods of different forecasting factors. Compared with the current experience-based forecasting factor selection method, the forecasting factor selection method based on the multi-candidate-factor interference elimination and the forecasting factor selection method based on the experience-based forecasting factor selection method based on the multi-candidate-factor interference elimination and the multi-candidate-factor selection.
In another aspect of the invention, the invention provides a method for predicting medium and long term runoff.
According to an embodiment of the present invention, the data input by the prediction method includes the forecasting factors identified by the above-mentioned method. It should be noted that, the prediction method includes, besides the data of the forecasting factor identified by the above method, other necessary steps, such as modeling, data normalization processing, data output and post-processing, etc., and those skilled in the art can supplement the prediction method according to the actual prediction simulation process, and details are not described herein.
In summary, according to the embodiments of the present invention, the present invention provides a prediction method, data of which includes a prediction factor related to the runoff remote of the medium-and-long term, so that the prediction method obtains higher accuracy of the medium-and-long term runoff and longer prediction period. It can be understood by those skilled in the art that the features and advantages described above for the method for identifying a forecasting factor for medium-long term runoff are still applicable to the method for forecasting medium-long term runoff, and are not described herein again.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (9)

1. A method for identifying a forecasting factor of medium-long term runoff is characterized by comprising the following steps:
(1) carrying out standardization treatment on the runoff sequence and the climate factor sequence so as to obtain a standardized runoff sequence Q and a standardized climate factor sequence F;
(2) setting and according to a forecast period, setting an alternative forecast factor set X consisting of a series of the standardized runoff sequences Q and the climate factor sequence F in different lag periods, and taking the corresponding standardized runoff sequence Q as a set Y in the Lasso regression;
(3) giving a parameter lambda, performing cross validation and calculating a forecast set Y ', and comparing the forecast set Y' with the set Y so as to obtain a first evaluation index of the parameter lambda;
(4) selecting M different parameters lambda, carrying out normalization processing on the first evaluation indexes, and adding the results of the normalization processing to obtain scores;
(5) counting the sum of the scores of each parameter lambda to serve as the total score of the parameters lambda, and selecting the parameter lambda with the highest total score as an optimal parameter;
(6) obtaining regression coefficients of all the climate factors in the step (3) according to the optimal parameters, wherein the climate factor corresponding to the nonzero regression coefficient is identified as the forecast factor;
in step (6), the step of identifying the climate factor corresponding to the non-zero regression coefficient as the predictor comprises:
and counting the non-zero regression coefficients in the calculation of 100 times of cross validation, obtaining the frequency of the non-zero regression coefficient of each climate factor, and taking the climate factor with the frequency greater than 0.95 as the forecasting factor.
2. The identification method according to claim 1, wherein, in the steps (1) and (2),
the runoff sequence is a sequence of current monthly runoff data,
the climate factor sequence is a matrix formed by 74 pieces of climate factor data of each month;
the forecast period is set to be any one of 1-12 months, and the lag period is 1-24 months.
3. The identification method according to claim 1, wherein in step (2), the set of candidate forecasting factors X and the set Y are in the form of:
Figure FDA0003037400810000011
Yt=Qt
wherein LT is a forecast period, t is a relative month number, Q is a normalized runoff sequence, and F is a normalized climate factor sequence.
4. The identification method according to claim 1, wherein in step (3), the step of cross-validating and calculating the forecast collection Y' comprises:
(3-1) randomly disordering and dividing the candidate forecasting factor set X and the candidate forecasting factor set Y into a plurality of pieces of data according to the same sequence;
(3-2) selecting one section from the multiple sections of data as data to be forecasted, using other data of the multiple sections of data as Lasso training data, and calculating regression coefficients of a climate factor and a runoff factor according to the parameter lambda;
and (3-3) calculating the regression coefficients according to the X scores of all the segments in other data of the plurality of segments of data in sequence, and combining the forecast set Y'.
5. The identification method according to claim 4, wherein the plurality of pieces of data is 26 pieces of data.
6. The identification method according to claim 1, wherein the step (3) further comprises:
repeating the steps of cross validation 100 times, calculating a forecast set Y 'and comparing the forecast set Y' with the set Y, and taking 100 mean square errors as a first evaluation index of the parameter lambda.
7. The identification method according to claim 6, wherein in the step (4), the step of normalizing the first evaluation index thereof includes:
and taking the mean value and the standard deviation of the 100 mean square errors as second evaluation indexes, and then carrying out normalization processing on the second evaluation indexes of the M different parameters lambda.
8. The identification method according to claim 7, wherein in the step (5), the optimal parameter is 0.14.
9. A method for predicting medium and long term runoff, wherein the data input by the prediction method comprises a forecasting factor identified by the method of any one of claims 1 to 8.
CN201710701183.3A 2017-08-16 2017-08-16 Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff Active CN107622322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710701183.3A CN107622322B (en) 2017-08-16 2017-08-16 Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710701183.3A CN107622322B (en) 2017-08-16 2017-08-16 Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff

Publications (2)

Publication Number Publication Date
CN107622322A CN107622322A (en) 2018-01-23
CN107622322B true CN107622322B (en) 2021-07-20

Family

ID=61088827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710701183.3A Active CN107622322B (en) 2017-08-16 2017-08-16 Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff

Country Status (1)

Country Link
CN (1) CN107622322B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523054A (en) * 2018-09-29 2019-03-26 中山大学 A kind of season Runoff Forecast selecting predictors method based on random walk
CN109472403B (en) * 2018-10-31 2022-03-11 中国水利水电科学研究院 Medium-and-long-term runoff forecasting method integrating empirical mode decomposition and remote correlation
CN110133755A (en) * 2019-04-19 2019-08-16 上海电力学院 Separated modeling forecast Control Algorithm is directly dissipated under more weather patterns based on GRA-LMBP weight
CN110196456A (en) * 2019-05-31 2019-09-03 河海大学 A kind of medium-term and long-term rainfall runoff forecasting method based on analog year grey correlation analysis
CN110555561B (en) * 2019-09-06 2022-04-01 清华大学 Medium-and-long-term runoff ensemble forecasting method
CN115689368B (en) * 2022-11-10 2023-08-01 华能西藏雅鲁藏布江水电开发投资有限公司 Runoff forecasting model evaluation method based on full life cycle
CN115713164B (en) * 2022-11-26 2023-11-24 福建中锐汉鼎数字科技有限公司 Drainage basin downstream water level prediction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
非稳态条件下中长期径流预报及其在水库调度中的应用;王君;《中国优秀硕士学位论文全文数据库 工程科技II辑》;20151215(第12期);正文第18-26页 *
高维数据变量选择的几点研究;于怡;《中国博士学位论文全文数据库 基础科学辑》;20150315(第03期);正文第45-49页 *

Also Published As

Publication number Publication date
CN107622322A (en) 2018-01-23

Similar Documents

Publication Publication Date Title
CN107622322B (en) Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff
Wang et al. Analysis and application of forecasting models in wind power integration: A review of multi-step-ahead wind speed forecasting models
CN108304668B (en) Flood prediction method combining hydrologic process data and historical prior data
CN101480143B (en) Method for predicating single yield of crops in irrigated area
Di et al. A four-stage hybrid model for hydrological time series forecasting
Piltan et al. Energy demand forecasting in Iranian metal industry using linear and nonlinear models based on evolutionary algorithms
CN111665575B (en) Medium-and-long-term rainfall grading coupling forecasting method and system based on statistical power
CN112288164B (en) Wind power combined prediction method considering spatial correlation and correcting numerical weather forecast
CN113705877B (en) Real-time moon runoff forecasting method based on deep learning model
JP6645043B2 (en) Error width estimation device, error width estimation system, error width estimation method, and program
CN111191193A (en) Long-term soil temperature and humidity high-precision prediction method based on autoregressive moving average model
Nhita A rainfall forecasting using fuzzy system based on genetic algorithm
Lee et al. Probabilistic wind power forecasting based on the laplace distribution and golden search
Manor et al. Bayesian Inference aided analog downscaling for near-surface winds in complex terrain
CN114117852B (en) Regional heat load rolling prediction method based on finite difference working domain division
CN114580762A (en) Hydrological forecast error correction method based on XGboost
Tao et al. Water footprint modeling and forecasting of cassava based on different artificial intelligence algorithms in Guangxi, China
Kakade et al. Seasonal prediction of summer monsoon rainfall over cluster regions of India
Martín et al. Analysis of wind power productions by means of an analog model
CN116826745B (en) Layered and partitioned short-term load prediction method and system in power system background
CN112380778A (en) Weather drought forecasting method based on sea temperature
Oduntan et al. A predictive model for improving cereals crop productivity using supervised machine learning algorithm
Mahdiraji et al. A Hybrid fuzzy regression-SSA approach for electricity consumption optimisation
CN117950087B (en) Artificial intelligence downscale climate prediction method based on large-scale optimal climate mode
CN117993305B (en) Dynamic evaluation method for river basin land utilization and soil erosion relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant