CN115689368B - Runoff forecasting model evaluation method based on full life cycle - Google Patents

Runoff forecasting model evaluation method based on full life cycle Download PDF

Info

Publication number
CN115689368B
CN115689368B CN202211405142.7A CN202211405142A CN115689368B CN 115689368 B CN115689368 B CN 115689368B CN 202211405142 A CN202211405142 A CN 202211405142A CN 115689368 B CN115689368 B CN 115689368B
Authority
CN
China
Prior art keywords
model
runoff
forecasting
index
calculated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211405142.7A
Other languages
Chinese (zh)
Other versions
CN115689368A (en
Inventor
杜三林
胡文斌
杨靖
王超
赵庆绪
廖卫红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Institute of Water Resources and Hydropower Research
Huaneng Group Technology Innovation Center Co Ltd
Huaneng Yarlung Tsangpo River Hydropower Development Investment Co Ltd
Original Assignee
China Institute of Water Resources and Hydropower Research
Huaneng Group Technology Innovation Center Co Ltd
Huaneng Yarlung Tsangpo River Hydropower Development Investment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Institute of Water Resources and Hydropower Research, Huaneng Group Technology Innovation Center Co Ltd, Huaneng Yarlung Tsangpo River Hydropower Development Investment Co Ltd filed Critical China Institute of Water Resources and Hydropower Research
Priority to CN202211405142.7A priority Critical patent/CN115689368B/en
Publication of CN115689368A publication Critical patent/CN115689368A/en
Application granted granted Critical
Publication of CN115689368B publication Critical patent/CN115689368B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a runoff forecasting model evaluation method based on a full life cycle, which can comprehensively evaluate model comprehensive performance, forecasting flow and the like of a runoff forecasting model by establishing an index evaluation system comprising a data comprehensive quality index P1, a forecasting factor comprehensive quality index P2, a sample representative index P3, a model generalization index P4 and a result comprehensive quality index P5, and is convenient for knowing and mastering weak links of the runoff forecasting model in the forecasting process so as to improve the application effect of the runoff forecasting model.

Description

Runoff forecasting model evaluation method based on full life cycle
Technical Field
The invention relates to the technical field of hydrologic forecasting, in particular to a runoff forecasting model evaluation method based on a full life cycle.
Background
The runoff forecasting is an important component of application hydrology, is an application science and technology for forecasting future runoff change based on mastering objective hydrology rules, and is an important basis for water resource scheduling, water conservancy flood prevention and drought resistance scientific implementation. Runoff forecasting often depends on runoff forecasting models, and as hydrologic data bursts grow, more and more runoff forecasting models built based on big data are gradually applied in production, such as Linear Regression (LR), gradient lifting regression (GBR), support Vector Regression (SVR) and the like. The accuracy of the runoff forecasting model directly influences the decision of the manager for optimizing the scheduling, and further the life and property safety of people is concerned. Therefore, how to evaluate the comprehensive performance of the runoff forecasting model is very important, a scientific and comprehensive evaluation standard plays a key role in the whole forecasting process and screening and correcting different forecasting models. In the prior art, methods such as root mean square error and Nash efficiency coefficient are adopted for evaluation, but the evaluation of the runoff forecasting model by the methods is often focused on a certain aspect, and the evaluation of the runoff forecasting model is not comprehensive enough.
Disclosure of Invention
The invention aims to provide a runoff forecasting model evaluation method based on a full life cycle, which can comprehensively evaluate the overall performance of a runoff forecasting model so as to improve the forecasting effect of the runoff forecasting model.
The technical scheme adopted for solving the technical problems is as follows: a runoff forecasting model evaluation method based on a full life cycle comprises the following steps,
s1, acquiring historical runoff data, determining a forecasting factor, screening the forecasting factor, and constructing a sample population for a runoff forecasting model;
s2, dividing the sample population into a training set and a testing set, training a runoff forecasting model by adopting the training set, and forecasting by the testing set;
s3, establishing a runoff forecasting model evaluation index system, wherein the runoff forecasting model evaluation index system comprises a data comprehensive quality index P1, a forecasting factor comprehensive quality index P2, a sample representative index P3, a model generalization index P4 and a result comprehensive quality index P5;
s4, evaluating the comprehensive performance of the runoff forecasting model through the model generalization index P4 and the result comprehensive quality index P5;
s5, evaluating the comprehensive effect of the runoff forecasting flow through the data comprehensive quality index P1, the forecasting factor comprehensive quality index P2, the sample representative index P3, the model generalization index P4 and the result comprehensive quality index P5.
Further, the data comprehensive quality index P1 is calculated by the following formula:
P1=1-(W+Q a )/2
wherein W represents the default rate of historical runoff data, Q a Represents the anomaly rate of historical runoff data W, Q a The calculation mode of (2) is as follows:
wherein n is the total number of observation sites in the river basin, A i For the missing data amount of site i, B i For theoretical data volume of site i, C i D is the abnormal data volume of site i i The above expression represents the data default rate and the data anomaly rate of the individual station when n=1 as the actual data amount of the station i.
Further, the predictor comprehensive quality index P2 is calculated by the following formula:
P2=(Q q +Q s )/2
wherein Q is q Represents the quality index of the forecasting factors, Q s Represents the quantity index of the predictor, Q q 、Q s The calculation mode of (2) is as follows:
Q q =max{|ra 1 |,…,|ra j |,…,|ra N |},
wherein |ra j The I represents the absolute value of the correlation coefficient between the jth predictor and the runoff prediction target sequence, N represents the characteristic quantity, namely the quantity of predictors, M represents the quantity of the first M predictors with the largest correlation coefficient absolute value with the runoff prediction target sequence,represents the j-th of the first M predictors a Absolute values of correlation coefficients between the individual predictors and the runoff prediction target sequence.
Further, the sample representative index P3 is calculated by the following formula:
P3=1-(R mean +R std )/2
wherein R is mean Represents the sample mean shift rate, R std Represents the standard deviation offset rate of the sample, R mean 、R std Calculated from the following formula,
where mu represents the average value of the overall output variable,represents the mean value of the sample output variable, sigma represents the standard deviation of the overall output variable, sigma y Representing the standard deviation of the sample output variable.
Furthermore, the model generalization index P4 comprehensively considers the relative root mean square error and the fitting goodness factor of the model test set and the training set, and simultaneously considers the number of samples and the number of forecasting factors of the samples, and the specific calculation mode is as follows:
wherein GP is RMSE The root mean square generalization rate index is represented,representing a fitting goodness generalization rate index;
GP RMSE calculated from the following formula:
wherein, RMSE test Representing the root mean square error of the model test set, RMSE train Representing the root mean square error of the model training set, RMSE test 、RMSE train The model test set and the model training set are calculated by the RMSE calculation formula, and the RMSE calculation formula is as follows:
wherein Q is k Representing the kth measured value in the calculated sequence,represents the kth simulation value calculated by the runoff forecasting model, H' is the calculated sequence length, if GP RMSE >1 is 1;
calculated from the following formula:
wherein,,
wherein,,represents the goodness of fit of the model training set, +.>Represents the goodness of fit of the model test set,representing corrected +.> Representing corrected +.>l 1 、l 2 Respectively representing the sample numbers of the training set and the test set, wherein N represents the characteristic number, namely the forecasting factor number; />From the goodness of fit R 2 The calculation formula calculates a model test set and a model training set to obtain R 2 The calculation formula is as follows:
wherein Q is k In order to calculate the kth measured value of the sequence,for the kth simulation value calculated by the runoff forecasting model, < >>The mean value of the measured values representing the calculated sequence, H' is the calculated sequence length.
Further, the result comprehensive quality index P5 is calculated by the following formula:
where rb is the correlation coefficient value of the simulated sequence and the runoff predicted target sequence, μ o Sum sigma o Respectively mean value and standard deviation of actual measurement value of runoff prediction target sequence, mu m Sum sigma m The mean and standard deviation of the simulated sequence, respectively.
Further, in the step S4, the evaluation of the comprehensive performance of the runoff forecasting model specifically includes:
s41, calculating a model Euclidean distance Dm:
s42, carrying out normalization processing on the Euclidean distance of the model to obtain a model comprehensive performance index NDm, judging the comprehensive performance of the runoff forecasting model according to whether the value of the NDm is close to 1, wherein the calculation mode of the NDm is specifically as follows:
further, in the step S5, the evaluation of the comprehensive effect of the runoff forecasting process specifically includes:
s51, calculating a forecast flow Euclidean distance DF:
s52, carrying out normalization processing on the Euclidean distance of the forecasting flow to obtain a forecasting flow comprehensive effect index NDF, judging the comprehensive effect of the forecasting flow according to whether the value of the NDF is close to 1, wherein the calculating mode of the NDF is as follows:
further, the method for judging the data as abnormal data in the data abnormal rate calculating process is as follows:
giving a confidence interval by using a box graph mode, and marking the upper and lower limit values of the confidence interval as Z up And Z down Lower quartile is Z 1 The upper quartile is Z 3 The upper and lower limit values of the confidence interval are calculated as follows: z is Z up= Z 3 +1.5(Z 3 -Z 1 ),Z down =Z 1 -1.5(Z 3 -Z 1 ) At the upper boundary Z of the box up And lower boundary Z down Other data are regarded as statistically abnormal data.
Compared with the prior art, the invention has the advantages that: by establishing an evaluation index system of the whole life cycle of the runoff forecasting model, all links of the whole life cycle of forecasting such as data comprehensive quality, sample integrity, forecasting factor integrity and comprehensive performance of the runoff forecasting model and comprehensive effect of the forecasting flow when the runoff forecasting model is adopted for runoff forecasting can be comprehensively and quantitatively evaluated, so that weak links of the runoff forecasting model in the forecasting process can be conveniently known and mastered for improvement, and the application effect of the runoff forecasting model is improved.
Drawings
Fig. 1 is a radar chart of each evaluation index of a runoff forecasting model of a certain hydrologic station.
Fig. 2 is a schematic diagram of the euclidean distance of a model of three runoff forecasting models of a hydrologic station.
FIG. 3 is a schematic diagram showing the comparison of model comprehensive performance indexes of three runoff forecasting models of a certain hydrologic station.
Fig. 4 is a schematic diagram showing comparison of the Euclidean distance values of the forecasting flow of three runoff forecasting models of a certain hydrologic station.
Fig. 5 is a schematic diagram showing comparison of forecasting flow comprehensive effect indexes of three runoff forecasting models of a certain hydrologic station.
FIG. 6 is a diagram of radar chart of the relationship between different data sequence lengths and resultant composite quality.
FIG. 7 is a graph of radar for the relationship between the number of different predictors and the resulting composite quality.
FIG. 8 is a graph of radar for various sample numbers versus resultant composite quality.
FIG. 9 is a radar chart of the comprehensive quality relationship between different runoff forecasting models and results.
Fig. 10 is a box diagram.
Detailed Description
The present invention is described in further detail below with reference to the embodiments of the drawings, examples of which are illustrated in the drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
In the description of the present application, it should be noted that, for the azimuth terms such as terms "center", "lateral", "longitudinal", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc., the azimuth and positional relationships are based on the azimuth or positional relationships shown in the drawings, it is merely for convenience of describing the present application and simplifying the description, and it is not to be construed as limiting the specific protection scope of the present application that the device or element referred to must have a specific azimuth configuration and operation, as indicated or implied. The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
The invention relates to a runoff forecasting model evaluation method based on a full life cycle, which comprises the following steps:
s1, acquiring historical runoff data, determining a forecasting factor, screening the forecasting factor, and constructing a sample population for a runoff forecasting model;
s2, dividing the sample population into a training set and a testing set, training a runoff forecasting model by adopting the training set, and forecasting by the testing set;
s3, establishing a runoff forecasting model evaluation index system, wherein the runoff forecasting model evaluation index system comprises a data comprehensive quality index P1, a forecasting factor comprehensive quality index P2, a sample representative index P3, a model generalization index P4 and a result comprehensive quality index P5;
s4, evaluating the comprehensive performance of the runoff forecasting model through the model generalization index P4 and the result comprehensive quality index P5;
s5, evaluating the comprehensive effect of the runoff forecasting flow through the data comprehensive quality index P1, the forecasting factor comprehensive quality index P2, the sample representative index P3, the model generalization index P4 and the result comprehensive quality index P5.
Specifically, in the present embodiment, the data integrated quality index P1 is calculated by the following formula:
P1=1-(W+Q a )/2
wherein W represents the default rate of historical runoff data, Q a Represents the anomaly rate of historical runoff data W, Q a The calculation mode of (2) is as follows:
wherein n is the total number of observation sites in the river basin, A i For the missing data amount of site i, B i For theoretical data volume of site i, C i D is the abnormal data volume of site i i The above equation represents calculation of the data default rate and the data anomaly rate of the individual station when n=1 for the actual data amount of the station i. It is understood from the calculation formula of P1 that the closer the value of P1 is to 1, the better the overall quality of the data is.
As one preferable mode, the data is judged to be abnormal data in the data abnormal rate calculation process is as follows:
giving a confidence interval by using a box graph mode, and marking the upper and lower limit values of the confidence interval as Z up And Z down Lower quartile is Z 1 Median is Z 2 The upper quartile is Z 3 The upper and lower limit values of the confidence interval are calculated as follows:
Z up= Z 3 +1.5(Z 3 -Z 1 ),Z down =Z 1 -1.5(Z 3 -Z 1 )
at the upper boundary Z of the box up And lower boundary Z down Other data are regarded as statistically abnormal data. The schematic diagram of the box-shaped diagram is shown in fig. 10, and the detailed description thereof will not be given here since the box-shaped diagram belongs to the prior art.
In this embodiment, the predictor integrated quality index P2 is calculated by the following formula:
P2=(Q q +Q s )/2
wherein Q is q Represents the quality index of the forecasting factors, Q s Represents the quantity index of the predictor, Q q 、Q s The calculation mode of (2) is as follows:
Q q =max{|ra 1 |,...,|ra j |,...,|ra N |},
wherein |ra j The I represents the absolute value of the correlation coefficient between the jth predictor and the runoff prediction target sequence, N represents the characteristic quantity, namely the quantity of predictors, M represents the quantity of the first M predictors with the largest correlation coefficient absolute value with the runoff prediction target sequence,represents the j-th of the first M predictors a Absolute values of correlation coefficients between the individual predictors and the runoff prediction target sequence. The closer the value of P2 is to 1, the better the comprehensive quality of the predictor is. The correlation coefficient may be calculated using the pearson correlation coefficient formula, namely:
wherein X, Y respectively represents a predictor and corresponding historical runoff data, namely a runoff prediction target sequence, j is used for representing a j-th predictor, j=1, 2,3 …, N is the number of predictors, k is used for representing the kth runoff data in the runoff prediction target sequence, and H is the length of the runoff prediction target sequence.
In the present embodiment, the sample representative index P3 is calculated by the following formula:
P3=1-(R mean +R std )/2
wherein R is mean Represents the sample mean shift rate, R std Represents the standard deviation offset rate of the sample, R mean 、R std Calculated from the following formula,
where μ represents the average of the overall output variables, i.e. the average of the runoff data in the population,representing the mean value of the sample output variables, i.e. the mean value of the runoff data in the training set, sigma represents the standard deviation of the overall output variables, i.e. the standard deviation of the runoff data in the overall set, sigma y The standard deviation of the sample output variable, i.e. the standard deviation of the runoff data in the training set, is represented. The closer the value of P3 is to 1, the better the sample representativeness is.
In the runoff prediction, one sample is an X '-Y' pair, where X 'represents a plurality of predictors selected by the screening, and Y' is historical runoff data corresponding to the predictors, that is, a predicted target sequence. The forecasting factor screening method is to screen out the first M forecasting factors with the largest absolute value of the correlation coefficient with the runoff forecasting target sequence by adopting the method.
In this embodiment, the model generalization index P4 comprehensively considers the relative root mean square error and the fitting goodness factor of the model test set and the training set, and simultaneously considers the number of samples and the number of forecasting factors of the samples, and the specific calculation mode is as follows:
wherein GP is RMSE The root mean square generalization rate index is represented,representing a fitting goodness generalization rate index;
GP RMSE calculated from the following formula:
wherein, RMSE test Representing the root mean square error of the model test set, RMSE train Representing the root mean square error of the model training set, RMSE test 、RMSE train The model test set and the model training set are calculated by the RMSE calculation formula, and the RMSE calculation formula is as follows:
wherein Q is k To calculate the kth measured value in the sequence,representing the kth simulation value calculated by the runoff forecasting model, wherein H' is the calculated sequence length, the calculated sequence length is the number of samples in the training set for the training set, the calculated sequence length is the number of samples in the test set for the test set, and if GP RMSE >1 is 1;
calculated from the following formula:
wherein,,
wherein,,represents the goodness of fit of the model training set, +.>Representing goodness of fit of a test set of models, i.e. the fit between a simulated or predicted sequence of values and a true valueThe degree, the value of which ranges from 0 to 1, is closer to 1, which means that the model predictive value fits better to the measured value, +.>Representing corrected +.> Representing corrected +.>l 1 、l 2 Respectively representing the sample numbers of the training set and the test set, wherein N represents the characteristic number, namely the forecasting factor number; />From the goodness of fit R 2 The calculation formula calculates a model test set and a model training set to obtain R 2 The calculation formula is as follows:
wherein Q is k In order to calculate the kth measured value of the sequence,for the kth simulation value calculated by the runoff forecasting model, < >>The mean value of the measured values representing the calculated sequence, H' is the calculated sequence length.
The closer the value of P4 is to 1, the better the model generalization is explained.
In the present embodiment, the resultant integrated quality index P5 is calculated by the following formula:
where rb is the correlation coefficient value between the analog sequence and the runoff predicted target sequence, and μ can be calculated by the pearson correlation coefficient formula as described above o Sum sigma o Respectively mean value and standard deviation of actual measurement value of runoff prediction target sequence, mu m Sum sigma m The mean and standard deviation of the simulated sequence, respectively. The closer the value of P5 is to 1, the better the overall quality of the result is.
In this embodiment, the manner of evaluating the comprehensive performance of the runoff forecasting model in S4 is as follows:
s41, calculating a model Euclidean distance Dm:
s42, carrying out normalization processing on the Euclidean distance of the model to obtain a model comprehensive performance index NDm, judging the comprehensive performance of the runoff forecasting model according to whether the value of the NDm is close to 1, wherein the calculation mode of the NDm is specifically as follows:if the value of NDm is closer to 1, which means that the comprehensive performance of the runoff forecasting model is better, in the practical application process, a lower limit value of NDm, such as 0.7, 0.8 and the like, is usually set, and if the value of NDm calculated by the runoff forecasting model is smaller than the lower limit value, the model parameters are proved to be unsatisfactory and need to be corrected.
The model evaluation method based on the Euclidean distance of the model has clear physical meaning, has interpretability and is convenient for visualization, but has two places which are not normalized compared with indexes P1-P5: firstly, the value range is not 0-1; and secondly, the larger the result value is, the better the effect of the index is, namely, the consistency of the size trend is opposite to that of the previous index, so that the model Euclidean distance is normalized. The NDm after normalization treatment has a value range of 0-1, and keeps consistency with the previous 5 indexes, namely P1-P5 value trend, namely, the closer to 1, the better the comprehensive performance of the model is represented, so that the overall analysis is facilitated.
In this embodiment, the manner of evaluating the comprehensive effect of the runoff forecasting procedure in S5 is as follows:
s51, calculating a forecast flow Euclidean distance DF:
s52, carrying out normalization processing on the Euclidean distance of the forecasting flow to obtain a forecasting flow comprehensive effect index NDF, judging the comprehensive effect of the forecasting flow according to whether the value of the NDF is close to 1, wherein the calculating mode of the NDF is as follows:if the value of NDF is closer to 1, the comprehensive effect of the runoff forecasting flow is better, in the practical application process, a lower limit value of NDF is usually set, and if the value of NDF calculated by the runoff forecasting model is smaller than the lower limit value, the model parameters are proved to be inconsistent, and correction and optimization are needed.
Taking historical runoff month scale data of a certain hydrologic station as an example, three runoff forecasting models of Linear Regression (LR), gradient lifting regression (GBR) and Support Vector Regression (SVR) are adopted to carry out model training and forecasting, and an evaluation index system established by the method is utilized to carry out corresponding evaluation.
Table 1 statistical table of runoff data from certain hydrologic stations 2012-2019 months
Date of day Runoff Q Date of day Runoff Q Date of day Runoff Q Date of day Runoff Q
201201 286.48 201401 621.71 201601 347.29 201801 321.48
201202 261 201402 557.86 201602 315.93 201802 310
201203 300.9 201403 765.16 201603 375.1 201803 342.32
201204 492.77 201404 1412.83 201604 863.93 201804 635.43
201205 566.81 201405 1089.81 201605 840.19 201805 351.29
201206 847.77 201406 1943.67 201606 1688.67 201806 517.63
201207 1419.42 201407 2480.65 201607 1513.06 201807 1533.55
201208 1953.55 201408 2035.81 201608 689.68 201808 2370.65
201209 1119.73 201409 2055.33 201609 986.83 201809 3380
201210 1068.13 201410 1145.97 201610 1388.71 201810 2255.81
201211 906.47 201411 955.07 201611 1294.67 201811 1342
201212 671.9 201412 445.94 201612 836.23 201812 848.23
201301 420.23 201501 362.68 201701 529.87 201901 515.74
201302 457.52 201502 360.36 201702 457.75 201902 445.46
201303 648.39 201503 383.29 201703 477.71 201903 483.97
201304 1289.23 201504 703.63 201704 773.5 201904 503.87
201305 1909.03 201505 819.52 201705 670.32 201905 518.87
201306 2008 201506 1540.33 201706 1057.63 201906 1160
201307 4134.52 201507 1950.48 201707 1409.55 201907 1387.84
201308 6984.84 201508 1003.61 201708 2224.52 201908 3220.65
201309 5691.67 201509 1360.57 201709 792.32 201909 5181.33
201310 2595.16 201510 972.52 201710 642.29 201910 2155.16
201311 1476.67 201511 755.33 201711 534.37 201911 1112.57
201312 995.58 201512 404.19 201712 377.13 201912 740.13
The total of 96 data in 8 years can be calculated according to the data comprehensive quality index formula, p1=0.9375 of the station, and of course, data in different time periods can be selected so as to change the data comprehensive quality index value from the data layer angle.
Assuming that the runoff data of a month of the site is Q (t), the runoff data of the month of the past 12 months of the month is selected as a candidate predictor, namely Q (t-1), Q (t-2), … and Q (t-12). In other words, the runoff data of 1 month in 2012 to 12 months in 2012 is used as a predictor, that is, the runoff data of 1 month in 2013 is predicted as a runoff prediction model input, the runoff data of 1 month in 2012 to 1 month in 2013 is used as a model input to predict the runoff data of 2 months in 2013, and so on. The actual measurement value of the runoff prediction target sequence is a month runoff data time sequence from 1 month in 2013 to 12 months in 2019.
For each month runoff data to be predicted, selecting the forecast factor with the largest absolute value of the first 5 correlation coefficients as the input forecast factor of the runoff forecast model, namely making the value of the parameter M be 5, and according to the Pearson correlation coefficient formula r XY The 5 prediction factors with the largest correlation coefficient with the runoff prediction target sequence can be calculated as Q (t-1), Q (t-12), Q (t-2), Q (t-11) and Q (t-6). Combining pearson correlation coefficient formula r XY And Q s 、Q q The value of the predictor integrated quality index P2 can be calculated as 0.59193. Since the pearson correlation coefficient formula belongs to the prior art, detailed calculation procedures are not listed here.
Then, an X '-Y' sample pair is constructed according to the time sequence of input and output, and a sample population is constructed, wherein 96-12=84 samples are included according to the existing data population, as shown in table 2.
Table 2 input/output sample pair statistics table
The 84 samples are then randomly divided into training and testing sets at a certain ratio, such as 70% as training set and 30% for testing. Of course, the training set and the testing set can be adjusted to other ratios, and the sample representative index value P3 is changed by changing the ratio of the training set and the testing set. In this embodiment, the sample construction test sets with numbers 1, 5, 10, 11, 12, 13, 19, 23, 29, 31, 34, 36, 40, 41, 43, 48, 50, 55, 56, 59, 66, 69, 71, 74, 77, 82 in table 2 are selected, the remaining samples in table 2 construct the training set, and the value of P3 can be obtained according to the sample representative index value P3 calculation formula to be 0.9356, wherein the overall mean and standard deviation are calculated from 84 samples in table 2, and the sample mean and standard deviation are calculated from samples in the training set.
And training the runoff forecasting model by adopting the training set constructed above, and then forecasting by using the constructed test set. And calculating the values of P4 and P5 according to a model generalization index P4 and a result comprehensive quality index P5 calculation formula.
In this embodiment, three runoff prediction models, namely a linear regression model (LR), a support vector regression model (SVR) and a gradient lifting regression model (GBR), are adopted for calculation, and the three runoff prediction models belong to the prior art, and detailed calculation processes of the three models are not elaborated and listed in the present application, and only the principles of the three models are briefly described below.
Model training and prediction using a linear regression model (LR), p4=0.5, p5= 0.4917 can be calculated by the above formula;
model training and prediction are performed by adopting a Support Vector Regression (SVR) model, so that p4= 0.7481 and p5= 0.3091 can be calculated;
model training and prediction using a gradient lifting regression model (GBR) can calculate p4= 0.74326, p5= 0.2380.
The calculated values of P1, P2 and P3 and the values of P4 and P5 corresponding to the three models are plotted in a circle with the radius of 1, and a radar chart shown in the attached figure 1 can be obtained. As can be seen from fig. 1, in the present embodiment, the three runoff prediction models Support Vector Regression (SVR) and gradient lifting regression (GBR) models in terms of model generalization, and the linear regression model (LR) has relatively poor generalization; and the linear regression model (LR) is the best in terms of overall quality of the results, the support vector regression model (SVR) is inferior, and the gradient lifting regression model (GBR) is relatively worst. The generalization of various models and the performance advantages and disadvantages of the comprehensive quality of the results and the like can be intuitively seen through the radar chart. If the single runoff forecasting model is evaluated, whether the value of each index is close to 1 or not can be seen, the corresponding performance is proved to be better when the value is close to 1, a lower limit threshold value can be set for each index during evaluation, if the calculated index value is larger than the corresponding threshold value, the performance represented by the index is proved to be good, the requirement is met, and if the calculated index value is lower than the corresponding threshold value, the corresponding optimization is needed. The same is true when the model comprehensive performance and the forecasting flow comprehensive effect are evaluated.
After the model generalization index P4 and the result comprehensive quality index P5 of each model are calculated, the euclidean distance Dm of the model comprehensive performance of each model and the corresponding runoff forecasting model comprehensive performance index NDm, the forecasting flow euclidean distance DF and the corresponding forecasting flow comprehensive effect index NDF can be calculated by adopting the foregoing formula.
Model training and prediction using a linear regression model (LR), dm=0.7130, ndm=0.4958, df=0.8264, ndf= 0.6304 can be calculated by the above formula;
model training and prediction using Support Vector Regression (SVR) model, dm=0.7354, ndm=0.4800, df=0.8458, ndf= 0.6217 can be calculated;
model training and prediction using a gradient lifting regression model (GBR) can calculate dm=0.8041, ndm=0.4314, df=0.9062, ndf= 0.5947.
Fig. 2 to 5 show schematic diagrams of comparison of model euclidean distances, model comprehensive performance indexes, prediction process euclidean distances and prediction process comprehensive effect indexes of three runoff prediction models respectively, and the advantages and disadvantages of the model comprehensive performance and the prediction process comprehensive effect of each runoff prediction model can be clearly and intuitively seen through fig. 2 to 5, the smaller the model euclidean distances and the prediction process euclidean distances are, the better the corresponding performance is shown, and the better the corresponding performance is proved by the values of the model comprehensive performance indexes and the process comprehensive effect indexes which are closer to 1, and are not repeated herein.
In the practical application process, the value of M may be selected according to the practical requirement, the value of M may be selected from 1 to 12, and in this embodiment, the value of M is 5 only to illustrate an example of how to construct a sample according to the forecasting factor, and also when the forecasting factor is selected, runoff data of more than 15 months, 20 months, etc. may be selected as the forecasting factor, and then screening is performed, and meanwhile, in order to improve the forecasting effect in the practical forecasting, besides using the runoff data of the past month as the forecasting factor, the past 130 atmospheric flow indexes may be cited as the forecasting factor to be screened, etc.
In order to prove the effectiveness of the index, the embodiment also adopts a control variable method, looks up the relevant change relation between P1-P4 and P5 on the premise of keeping other indexes unchanged, and visualizes the result as a radar chart.
The relationship between P1 and P5 is mainly to control the length of the data sequence, that is, the values of sequences with different lengths are selected from the historical runoff data and predicted by using a linear regression model (LR), so that a radar chart as shown in fig. 6 can be obtained, wherein the sequence of data 1 is longest, the sequence of data 2 times is shortest, and the data 3 is shortest, which means that the longer the data sequence is, the larger the result comprehensive quality index is relatively, and the better the effect is.
The relation between P2 and P5 is mainly to control the quantity of predictors, namely, different quantity of predictors are adopted and a linear regression model (LR) is used for prediction, so that a radar chart shown in FIG. 7 can be obtained, the quantity of predictors of factor 1 is the largest, factor 2 times is the smallest, the quantity of predictors of factor 3 is the smallest, and the higher the quantity of predictors, the better the comprehensive quality of the predictors, the larger the comprehensive quality index of the result, and the better the effect.
The relation between P3 and P5 is mainly to control the proportion of the training set and the test set, and the linear regression model (LR) is used for prediction, so that a radar chart shown in the figure 8 can be obtained, wherein the proportion of the training set in a sample 1 is the largest, the sample is 2 times, the proportion of the training set in a sample three is the smallest, and the higher the proportion of the training set is, the better the sample representativeness is, the larger the result comprehensive quality index is relatively, and the better the effect is.
The relationship between P4 and P5 is performed by selecting different prediction models, and a radar chart as shown in fig. 9 can be obtained, and as can be seen from the chart, the better the generalization of the model, the larger the result comprehensive quality index is, and the better the effect is. However, although the model generalization and the result comprehensive quality are positively correlated as shown in fig. 7, the model generalization and the result comprehensive quality are not absolutely causal and absolutely positively correlated, and it can be said that the better the statistically large probability is, the better the model generalization is. It should be further described that, in the process of obtaining fig. 1, only 5 historical runoff data are used as predictors, only a simplified calculation example is used, in the process of practical application, the runoff prediction model needs to consider multiple types and multiple numbers of predictors, so in the process of obtaining fig. 7, 130 atmospheric flow factors are used, the 130 atmospheric flow factors come from 130 monitoring indexes issued by the chinese weather bureau, including 88 atmospheric flow indexes, 26 sea temperature indexes and 16 other indexes, can be obtained from the national climate center website of the chinese weather bureau in a public manner, the number of predictors in the calculation process of fig. 7 is much more than that used in fig. 1, the comprehensive quality index of the predictors is relatively better, the comprehensive quality index of the predictors is closer to the practical application scene of the runoff prediction model, in the premise that the better the comprehensive quality of the predicted result is higher, and in the premise that the comprehensive quality of the predictors is not good, the certainty of the result is worse, and the correlation of the generalization of the model and the comprehensive quality of the result is not high; and the model generalization property and the result comprehensive quality of different models are not completely the same and positively correlated even though the data quality, the forecasting factors and the sample representativeness are the same, and if the model is the same model and other factors are the same, the forecasting quality is higher as the model generalization property is higher. This is why fig. 1 and 7 differ.
From the four radar graphs, each index P1-P4 can basically form a positive correlation with the result comprehensive index P5 under the condition that other indexes are basically kept unchanged, so that the effectiveness of the extracted indexes is proved.
Three runoff forecasting models adopted in this embodiment are briefly described below.
Linear regression model (LR): multiple linear regression is a mathematical statistical method, and in hydrologic long-term prediction, due to the complexity of influencing factors, it is generally insufficient to consider only one predictor, and factors with significant influence on the predicted object must be screened, namely, predictor selection. The predictor is a pre-factor affecting the object of prediction, and is an independent variable in the multiple regression equation, so that the correct selection of the predictor is a key factor for the quality of the prediction result. In general, choosing predictors requires attention to the following: firstly, analyzing the degree of association between different hydrological factors and a forecast object; secondly, sorting through statistical analysis, and simultaneously considering the co-linearity influence among factors; and finally, weather factors obviously related to the existence of the forecast object are selected as forecast factors.
The construction step of the multiple linear regression equation: if k 'predictors have been selected, regression analysis is required to establish the relationship between these factors and the predictor y', and the mathematical model is:
y'=β 01 X” 1 +...+β k' X” k’
wherein beta is 01 ,...,β k' Is equation regression coefficient, also called forecast equation coefficient, y 'is observed value, X' k’ And epsilon is an equation residual term for the corresponding value of the actually measured predictor.
Support vector regression model (SVR): the support vector machine (Support Vector Machine, SVM) is a fast and reliable classification algorithm that can perform well with limited data volumes. The method can reasonably solve the problems of high dimensionality, small sample, nonlinearity and the like, and is widely applied to medium-long-term runoff forecasting in recent years. The support vector regression machine (Support Vector Machine for Regression, SVR) is a regression algorithm based on SVM, and has a firm and reliable theoretical basis as a support.
The calculation principle of SVR is that a "interval band" is manufactured on both sides of the linear function, called tolerance e, which is an empirical value set by human, and all data samples in the interval band are not lost, that is, only the support vector will affect the function model, and finally the optimized model is obtained by minimizing the total loss and maximizing the interval. The basic idea of the SVR model is to map a training set to the space of another high-dimensional feature by means of a nonlinear mapping by representing the whole sample set with a few support vectors, so as to achieve the purpose of converting the nonlinear function estimation problem in the input space into linear function estimation in the high-dimensional feature space. The Support Vector Regression (SVR) medium-long-term runoff forecasting based on particle swarm optimization is one of the methods with relatively high application rate at present.
Gradient lifting regression (GBR): gradient lifting regression, GBR, is a technique to learn from errors. Essentially, the method is a thinking mode with concentrated thinking, and a poor learning algorithm is integrated to learn. However, there are two points to be noted with this algorithm: firstly, the accuracy of each learning algorithm is not high. But they are integrated to achieve good accuracy; and secondly, the learning algorithms are sequentially applied, that is, each learning algorithm learns in the error of the previous learning algorithm, so that the accuracy is improved. When gradient lifting is used to predict continuous values (e.g., runoff), we use gradient lifting to represent regression. This is not the same as using linear regression. The B of GBR refers to Boosting, which is a machine learning algorithm. Typically, ensemble learning obtains a number of samples by resampling and then trains a number of weak learning machines, resulting in a powerful learner.
According to the method, through establishing the evaluation index system of the whole life cycle of the runoff forecasting model, when the runoff forecasting model is adopted for runoff forecasting, all links of the whole life cycle of the forecasting such as the selected comprehensive quality of data, the integrity of samples, the integrity of forecasting factors, the comprehensive performance of the runoff forecasting model, the comprehensive effect of the forecasting flow and the like can be comprehensively and quantitatively evaluated, so that the weak links of the runoff forecasting model in the forecasting process can be conveniently known and mastered for improvement, and the application effect of the runoff forecasting model is improved.
While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (2)

1. A runoff forecasting model evaluation method based on a full life cycle is characterized by comprising the following steps of: comprising the steps of (a) a step of,
s1, acquiring historical runoff data, determining a forecasting factor, screening the forecasting factor, and constructing a sample population for a runoff forecasting model;
s2, dividing the sample population into a training set and a testing set, training a runoff forecasting model by adopting the training set, and forecasting by the testing set;
s3, establishing a runoff forecasting model evaluation index system, wherein the runoff forecasting model evaluation index system comprises a data comprehensive quality index P1, a forecasting factor comprehensive quality index P2, a sample representative index P3, a model generalization index P4 and a result comprehensive quality index P5;
s4, evaluating the comprehensive performance of the runoff forecasting model through the model generalization index P4 and the result comprehensive quality index P5;
s5, evaluating the comprehensive effect of the runoff forecasting flow through the data comprehensive quality index P1, the forecasting factor comprehensive quality index P2, the sample representative index P3, the model generalization index P4 and the result comprehensive quality index P5;
the data comprehensive quality index P1 is calculated by the following formula:
P1=1-(W+Q a )/2
wherein W represents the default rate of historical runoff data, Q a Represents the anomaly rate of historical runoff data W, Q a The calculation mode of (2) is as follows:
wherein n is the total number of observation sites in the river basin, A i For the missing data amount of site i, B i For theoretical data volume of site i, C i D is the abnormal data volume of site i i The actual data amount for site i;
the predictor comprehensive quality index P2 is calculated by the following formula:
P2=(Q q +Q s )/2
wherein Q is q Represents the quality index of the forecasting factors, Q s Represents the quantity index of the predictor, Q q 、Q s The calculation mode of (2) is as follows:
Q q =max{|ra 1 |,...,|ra j |,...,|ra N |},
wherein |ra j The I represents the absolute value of the correlation coefficient between the jth predictor and the runoff prediction target sequence, N represents the characteristic quantity, namely the quantity of predictors, M represents the quantity of the first M predictors with the largest correlation coefficient absolute value with the runoff prediction target sequence,represents the j-th of the first M predictors a Absolute values of correlation coefficients between the individual predictors and the runoff prediction target sequence;
the sample representative index P3 is calculated by the following formula:
P3=1-(R mean +R std )/2
wherein R is mean Represents the sample mean shift rate, R std Represents the standard deviation offset rate of the sample, R mean 、R std Calculated from the following formula,
where mu represents the average value of the overall output variable,represents the mean value of the sample output variable, sigma represents the standard deviation of the overall output variable, sigma y Standard deviation representing the sample output variable;
the model generalization index P4 comprehensively considers the relative root mean square error and the fitting goodness factor of the model test set and the training set, and simultaneously considers the number of samples and the number of forecasting factors of the samples, and the specific calculation mode is as follows:
wherein GP is RMSE The root mean square generalization rate index is represented,representing a fitting goodness generalization rate index;
GP RMSE calculated from the following formula:
wherein, RMSE test Representing the root mean square error of the model test set, RMSE train Representing the root mean square error of the model training set, RMSE test 、RMSE train The model test set and the model training set are calculated by the RMSE calculation formula, and the RMSE calculation formula is as follows:
wherein Q is k Representing the kth measured value in the calculated sequence,represents the kth simulation value calculated by the runoff forecasting model, H' is the calculated sequence length, if GP RMSE >1 is 1;
calculated from the following formula:
wherein,,
wherein,,represents the goodness of fit of the model training set, +.>Represents the goodness of fit of the model test set, +.>Representing corrected +.>Representing corrected +.>l 1 、l 2 Respectively represent training sets and testsThe number of set samples, N, represents the number of features, namely the number of predictors; />From the goodness of fit R 2 The calculation formula calculates a model test set and a model training set to obtain R 2 The calculation formula is as follows:
wherein Q is k Representing the kth measured value of the calculation sequence,for the kth simulation value calculated by the runoff forecasting model,representing the average value of actual measurement values of the calculated sequence, wherein H' is the length of the calculated sequence;
the result comprehensive quality index P5 is calculated by the following formula:
where rb is the correlation coefficient value of the simulated sequence and the runoff predicted target sequence, μ o Sum sigma o Respectively mean value and standard deviation of actual measurement value of runoff prediction target sequence, mu m Sum sigma m The mean value and standard deviation of the analog sequence are respectively;
and in the step S4, evaluating the comprehensive performance of the runoff forecasting model, which specifically comprises the following steps:
s41, calculating a model Euclidean distance Dm:
s42, carrying out normalization processing on the Euclidean distance of the model to obtain a model healdAnd judging the comprehensive performance of the runoff forecasting model according to the condition that the value of the NDm is close to 1, wherein the calculating mode of the NDm is as follows:
in the step S5, the comprehensive effect of the runoff forecasting process is evaluated, which specifically includes:
s51, calculating a forecast flow Euclidean distance DF:
s52, carrying out normalization processing on the Euclidean distance of the forecasting flow to obtain a forecasting flow comprehensive effect index NDF, judging the comprehensive effect of the forecasting flow according to whether the value of the NDF is close to 1, wherein the calculating mode of the NDF is as follows:
2. the full life cycle-based runoff forecasting model evaluation method of claim 1, wherein the method comprises the following steps of:
the method for judging the data as abnormal data in the data abnormal rate calculation process is as follows:
giving a confidence interval by using a box graph mode, and marking the upper and lower limit values of the confidence interval as Z up And Z down Lower quartile is Z 1 The upper quartile is Z 3 The upper and lower limit values of the confidence interval are calculated as follows: z is Z up =Z 3 +1.5(Z 3 -Z 1 ),Z down =Z 1 -1.5(Z 3 -Z 1 ) At the upper boundary Z of the box up And lower boundary Z down Other data are regarded as statistically abnormal data.
CN202211405142.7A 2022-11-10 2022-11-10 Runoff forecasting model evaluation method based on full life cycle Active CN115689368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211405142.7A CN115689368B (en) 2022-11-10 2022-11-10 Runoff forecasting model evaluation method based on full life cycle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211405142.7A CN115689368B (en) 2022-11-10 2022-11-10 Runoff forecasting model evaluation method based on full life cycle

Publications (2)

Publication Number Publication Date
CN115689368A CN115689368A (en) 2023-02-03
CN115689368B true CN115689368B (en) 2023-08-01

Family

ID=85052114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211405142.7A Active CN115689368B (en) 2022-11-10 2022-11-10 Runoff forecasting model evaluation method based on full life cycle

Country Status (1)

Country Link
CN (1) CN115689368B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001094937A1 (en) * 2000-06-09 2001-12-13 Watertrax Inc. Integrated water quality monitoring system
CN111461453A (en) * 2020-04-13 2020-07-28 中国水利水电科学研究院 Medium-and-long-term runoff ensemble forecasting method based on multi-model combination

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103674921A (en) * 2013-12-18 2014-03-26 安徽理工大学 K-nearest neighbor based detection method for predicting underground coal mine water bursting source
CN107622322B (en) * 2017-08-16 2021-07-20 国网青海省电力公司 Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff
CN110765418B (en) * 2019-10-09 2021-07-20 清华大学 Intelligent set evaluation method and system for basin water and sand research model
CN112801342A (en) * 2020-12-31 2021-05-14 国电大渡河流域水电开发有限公司 Adaptive runoff forecasting method based on rainfall runoff similarity
US20220327447A1 (en) * 2021-03-30 2022-10-13 Climate Check, Inc. Climate-based risk rating
CN113592144A (en) * 2021-06-28 2021-11-02 清华大学 Medium-and-long-term runoff probability forecasting method and system
CN113705877B (en) * 2021-08-23 2023-09-12 武汉大学 Real-time moon runoff forecasting method based on deep learning model
CN114841402A (en) * 2022-03-25 2022-08-02 北京科技大学 Underground water level height prediction method and system based on multi-feature map network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001094937A1 (en) * 2000-06-09 2001-12-13 Watertrax Inc. Integrated water quality monitoring system
CN111461453A (en) * 2020-04-13 2020-07-28 中国水利水电科学研究院 Medium-and-long-term runoff ensemble forecasting method based on multi-model combination

Also Published As

Publication number Publication date
CN115689368A (en) 2023-02-03

Similar Documents

Publication Publication Date Title
Liu et al. Coupling the k-nearest neighbor procedure with the Kalman filter for real-time updating of the hydraulic model in flood forecasting
CN101480143B (en) Method for predicating single yield of crops in irrigated area
CN111665575B (en) Medium-and-long-term rainfall grading coupling forecasting method and system based on statistical power
CN106408423A (en) Method and system for risk assessment and method for constructing system for risk assessment
CN110009140B (en) Daily power load prediction method and prediction device
CN110648014A (en) Regional wind power prediction method and system based on space-time quantile regression
CN112288164A (en) Wind power combined prediction method considering spatial correlation and correcting numerical weather forecast
CN110276477B (en) Flood forecasting method based on hierarchical Bayesian network and incremental learning
CN115495991A (en) Rainfall interval prediction method based on time convolution network
CN113723541B (en) Slope displacement prediction method based on hybrid intelligent algorithm
CN116415730A (en) Fusion self-attention mechanism time-space deep learning model for predicting water level
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN115618720A (en) Soil salinization prediction method and system based on altitude
CN114862035A (en) Combined bay water temperature prediction method based on transfer learning
CN117909888A (en) Intelligent artificial intelligence climate prediction method
CN115689368B (en) Runoff forecasting model evaluation method based on full life cycle
Lu et al. Uncertainty quantification of machine learning models to improve streamflow prediction under changing climate and environmental conditions
WO2023245399A1 (en) Rice production potential simulation method based on land system and climate change coupling
Nasr et al. Multivariate L-moment based tests for copula selection, with hydrometeorological applications
CN107977727B (en) Method for predicting blocking probability of optical cable network based on social development and climate factors
CN116384538A (en) River basin runoff forecasting method, device and storage medium
CN113344290B (en) Method for correcting sub-season rainfall weather forecast based on U-Net network
CN114550842A (en) Molecular prediction method and system for drug compound inhibiting biological activity of target protein
Cao et al. Probabilistic runoff forecasting considering stepwise decomposition framework and external factor integration structure
Katušić et al. A comparison of data-driven methods in prediction of weather patterns in central Croatia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant