CN113869600A - Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction - Google Patents
Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction Download PDFInfo
- Publication number
- CN113869600A CN113869600A CN202111210827.1A CN202111210827A CN113869600A CN 113869600 A CN113869600 A CN 113869600A CN 202111210827 A CN202111210827 A CN 202111210827A CN 113869600 A CN113869600 A CN 113869600A
- Authority
- CN
- China
- Prior art keywords
- peak
- valley difference
- valley
- load
- difference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 61
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims description 59
- 238000005259 measurement Methods 0.000 claims abstract description 23
- 230000001932 seasonal effect Effects 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 22
- 238000012549 training Methods 0.000 claims description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 230000002068 genetic effect Effects 0.000 claims description 14
- 238000003066 decision tree Methods 0.000 claims description 10
- 230000007774 longterm Effects 0.000 claims description 10
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 claims description 6
- 230000005611 electricity Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 5
- 150000001875 compounds Chemical class 0.000 claims description 4
- 238000013213 extrapolation Methods 0.000 claims description 3
- 238000013138 pruning Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000000691 measurement method Methods 0.000 claims description 2
- 230000004044 response Effects 0.000 abstract description 10
- 230000000694 effects Effects 0.000 abstract description 7
- 238000011161 development Methods 0.000 abstract description 5
- 230000001737 promoting effect Effects 0.000 abstract description 3
- 230000002354 daily effect Effects 0.000 description 18
- 230000006870 function Effects 0.000 description 7
- 238000012706 support-vector machine Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 5
- 230000035772 mutation Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 description 3
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- 101150041570 TOP1 gene Proteins 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101100261000 Caenorhabditis elegans top-3 gene Proteins 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a random forest and secondary correction-based peak-valley difference medium-and-long term measurement model, which is used for evaluating medium-and-long term demand response implementation effect, collecting historical load data of a plurality of residential users and calculating historical peak-valley differences, and analyzing multi-source influence factors of the load peak-valley differences of users on demand sides; extracting the characteristics of the multi-source influence factors, and extracting the optimal characteristic combination by adopting binary characteristic engineering as the input of a random forest model; constructing a random forest-based peak-valley difference measurement model, and outputting measurement results of the monthly peak-valley difference and the seasonal peak-valley difference; based on historically collected load peak-valley difference data of users on the demand side, the screened related correction factors are selected one by one as input, a Bayesian regression model is constructed so as to realize fitting modeling of the load peak-valley difference of the users, and the medium-long term prediction result of the one-time seasonal peak-valley difference is corrected according to the fitting result. The invention has important significance for promoting the response development of the demand side and relieving the contradiction between power supply and demand.
Description
Technical Field
The invention belongs to the technical field of power systems, and relates to a peak-valley difference medium-long term prediction method based on random forest and secondary correction
Background
Under the background of 'carbon peak reaching and carbon neutralization', demand response facing flexible interactive intelligent power utilization becomes a development trend. The resident load is used as an important component of a demand response user, peak clipping and valley filling can be effectively realized, and the reliable and stable operation of the power system is promoted. However, the load characteristics of the demand-side user are affected by various factors such as weather conditions, population growth, and economic development, and it is difficult to perform effective medium-and long-term demand-side evaluation, which affects the reliability of the medium-and long-term demand-side response implementation evaluation. Therefore, the research on how to accurately predict the middle-long-term load peak-valley difference has important significance for promoting the response development of the demand side and relieving the contradiction between power supply and demand.
The work of using peak-to-valley difference prediction as a research focus is very limited. From the existing prediction methods, the prediction methods can be classified into deep learning, statistical models, and machine learning models. The traditional statistical prediction method is easy to implement, and extra input does not need to be acquired. However, in many cases, the accuracy is often limited because only historical data is considered. The deep learning method has good prediction performance and is widely concerned in recent years, but has good prediction accuracy due to the periodicity and the discontinuity of the moon peak valley difference and the quaternary peak valley difference, but is more suitable for continuous time series prediction. The traditional machine learning method is high in calculation speed and strong in generalization capability. The machine learning method comprises a support vector machine, a random forest and the like.
The support vector machine can improve the generalization capability of the learning machine as much as possible, the calculation speed is high, when the binary feature combination optimization is carried out by utilizing the genetic algorithm, the support vector machine can be adopted to carry out peak-valley difference prediction, and the fitness function is set as a loss function value between the prediction after the training of the support vector machine and an actual value. The random forest model is basically a bagging method, robustness is provided for overfitting, and the performance of weak learners (decision trees) is improved through a voting method or an averaging method according to a prediction result. The method has the advantages of difficult overfitting, strong anti-noise capability, high calculation speed and high prediction precision.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a peak-valley difference medium-long term prediction method based on random forests and secondary correction, the prediction precision is improved based on the secondary correction, the load peak-valley difference prediction accuracy is improved by using the load side peak-valley difference characteristics, and more reliable guidance is provided for the operation and the scheduling of a power system.
The invention adopts the following technical scheme. The invention provides a method for measuring and calculating peak-valley difference in medium and long periods based on random forests and secondary correction, which comprises the following steps of:
step 1, collecting historical electricity load data of a plurality of residential user areas within a set number of years, calculating historical peak-valley differences, collecting influence factor data influencing the load peak-valley differences, and taking related influence factors as alternative characteristics;
step 3, training the training data of the optimal characteristic combination selected in the step 2 by using a random forest algorithm to obtain a load peak valley difference measuring and calculating model of the user at the demand side, and outputting a primary medium-long term prediction result of the moon peak valley difference and the season peak valley difference;
and 4, selecting the screened related correction factors one by one as input based on the historically collected peak-valley difference data of the user load on the demand side, constructing a Bayesian regression model so as to realize fitting modeling of the peak-valley difference of the user load, and correcting the medium-long term prediction result of the primary quaternary peak-valley difference according to the fitting result.
Preferably, the historical load data of electricity consumption in step 1 includes daily maximum load data and daily minimum load data; the historical peak-valley difference comprises a daily peak-valley difference, a monthly peak-valley difference and a seasonal peak-valley difference; the influencing factor characteristics include: daily maximum air temperature, daily minimum air temperature, daily average air temperature, air pressure, humidity, rainfall, wind speed, daily average load.
Preferably, step 2 specifically comprises:
step 2.1, calculating the correlation degree between the candidate features and the peak-valley difference in the step 1, and screening n candidate features from high to low according to the correlation degree;
and 2.2, screening the optimal characteristic combination as the input of measurement and calculation by adopting a binary characteristic combination method.
Preferably, step 2.2 specifically comprises:
step 2.2.1, use binary coding to distinguish the use state of the alternative features, i.e. used or abandoned, and screen out the binary feature data set
Step 2.2.2, binary characteristic data set screened outAnd as an input of the genetic algorithm, searching for an optimal feature combination based on the genetic algorithm.
wherein n is the number of the alternative features,
x of the ith featureiCorresponding binary code is wi,wiHas two states of 0 and 1, when wiWhen 0, this feature is not used; when w isiWhen 1, this feature is used.
Preferably, step 3 specifically includes:
step 3.1, taking the random forest as a basis for one-time medium and long-term measurement and calculation;
step 3.2, aiming at the historical load data of the user at the demand side, calculating the load natural growth rate monthly and quarterly according to the time scale measured and calculated by the peak-valley difference based on a trend extrapolation method;
3.3, forming a data-driven training sample based on historical average peak-valley difference acquired month by month and season by season, the natural load increase rate and the screened peak-valley difference calculating influence factors; constructing a peak-valley difference measurement model based on random forests and training; the trained model can output the result of the medium-term and long-term measurement and calculation of the load peak-valley difference of the user at the demand side.
Preferably, step 3.1 specifically comprises:
step 3.1.1, setting a data set with optimal combination characteristics in the last N years as an original sample, sampling the original sample by using a bootstrap method, generating K data sets as a training set of a decision tree, wherein N is a positive integer and is less than the set number of years in the step 1;
step 3.1.2, if M input variables originally exist, each node randomly selects M specific variables, and determines the optimal classification point according to the M specific variables, wherein M is less than M;
step 3.1.3, each decision tree is grown to the maximum possible without pruning;
and 3.1.4, taking the average value of all decision trees as a predicted value.
Preferably, step 4 specifically includes:
step 4.1, constructing a Bayesian ridge regression model based on the historical load peak-valley difference of the user at the demand side and the screened related correction factors;
step 4.2, based on the Bayesian ridge regression model, respectively establishing fitting relations between the difference of the quaternary peak and the valley and population correction factors and resident consumption level correction factors, and obtaining population correction fitting curves and resident consumption level fitting curves,
and 4.3, calculating a correction coefficient based on the two fitting curves, and correcting the peak-valley difference of the load of the demand side user in the measurement result of the peak-valley difference of the load of the demand side user obtained in the step 3 in sequence to obtain a correction result of the peak-valley difference of the load of the demand side user.
Preferably, the specific calculation process in the bayesian ridge regression model is as follows:
wherein the content of the first and second substances,
p (w | a, b) is the parameter w distribution probability for conditional features a and b;
a loss function which is a ridge regression consisting in solving so that the ith output yiAnd input xijFamily of parameters between which the fitting error is minimal { beta }j};
Preferably, step 4.3 is to use the correction factor D*The calculation method of (c) is as follows:
in the formula (I), the compound is shown in the specification,
d is the measured peak-valley difference of a certain quarter of the nth year,
d is the corrected difference of peak and valley in a certain quarter of the nth year,
dn-1to fit the difference between the peak and valley of the season for a certain quarter of year n-1 on the curve,
dnthe difference between the peak and valley of the season for a certain quarter of the nth year on the fitted curve.
Compared with the prior art, the method has the advantages that the method is based on the random forest and secondary correction, the peak-valley difference medium-and-long term measurement model prediction method is provided, and the prediction precision is improved based on the secondary correction, aiming at the problems that the load change of a user on the demand side is flexible day by day, the implementation effect of the response on the demand side is difficult to evaluate accurately, and the reliability of the implementation and evaluation of the response on the medium-and-long term demand side depends on the accurate load peak-valley difference prediction.
Drawings
FIG. 1 is a schematic flow chart of a peak-to-valley difference prediction method based on random forest and secondary correction according to the present invention;
FIG. 2 is a schematic diagram of the binary feature combination optimization process of the present invention;
FIG. 3 is a flow chart of one-time medium-long term prediction based on random forests according to the present invention;
FIG. 4 is a flow chart of a quadratic correction based on Bayesian ridge regression according to the present invention;
FIG. 5 is a graph showing a peak-to-valley difference prediction result of a medium-to-long term month using the proposed method according to an embodiment of the present invention;
FIG. 6 is a diagram of the prediction results of the first-order medium-and long-term peak-valley difference and the prediction effect of the second-order corrected peak-valley difference according to the embodiment of the present invention;
FIG. 7 is a population correction fit curve in an embodiment of the present invention;
FIG. 8 is a graph of a fitted residential consumption level curve according to the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
In the invention, the medium-long term prediction method refers to a prediction method with the prediction time between one month and one year, and the prediction method can be divided into an ultra-short term (0-6h), a short term (6h-1d) and a medium-long term (1 month-1 year) prediction method according to different prediction time scales.
As shown in FIG. 1, the invention provides a peak-to-valley difference medium-and-long term estimation method based on random forests and secondary correction, which comprises the following steps:
step 1, collecting historical load data of a plurality of residential users within a set number of years, calculating historical peak-valley difference, and collecting influence factor data influencing the load peak-valley difference as alternative characteristics;
in the embodiment, the monthly peak-valley difference and the quarterly peak-valley difference of three residential areas of high, medium and low grades in a certain city in Jiangsu province in China are taken as research objects, historical data of electricity loads of six years from 2013 to 2018 are collected, the data collection time resolution is 15min, 96 data are collected every day, wherein the historical data comprise daily maximum load data and daily minimum load data, the daily peak-valley difference is obtained through calculation according to the daily maximum load data and the daily minimum load data, the monthly peak-valley difference and the quarterly peak-valley difference are further obtained through calculation, and daily average load data are calculated according to the collected 96 daily electricity load historical data. Further, influence factors such as daily maximum air temperature, daily minimum air temperature, daily average air temperature, air pressure, humidity, rainfall, wind speed, daily average load, and the like are considered as candidate characteristics. Population factors and GDP annual influence factors are used as secondary correction factors.
The former five-year data is used as training data, and the last-year data is used as test data to verify the validity of data prediction. The prediction error evaluation indexes are the average absolute percentage error MAPE, the average absolute error MAE and the root mean square error RMSE.
the step 2 specifically comprises the following steps:
step 2.1, calculating the correlation degree between the candidate features and the peak-valley difference in the step 1, screening n candidate features from high to low according to the correlation degree,
in order to improve the prediction accuracy, the characteristic factors which have significant correlation with the peak-valley difference of the load on the resident side are screened out in a lean mode by considering multiple influence factors, and a preferable but non-limiting implementation mode is that the correlation degree between the candidate characteristics and the peak-valley difference is quantitatively analyzed by adopting a Persons coefficient. The formula for the Persons coefficient is as follows:
wherein the content of the first and second substances,
x is the difference between the peak and the valley,
y is any influencing factor;
the degree of correlation between the candidate features and the peak-to-valley difference was quantitatively analyzed using a Persons coefficient, as listed in table 1 for the degree of correlation of the candidate features as a combination of binary features,
table 1: degree of correlation between peak-to-valley difference and each influence factor
And 2.2, screening the optimal characteristic combination as the input of measurement and calculation by adopting a binary characteristic combination method.
Because the influence of the related influence factors on the peak-valley difference is combined and the effect of a single factor on the peak-valley difference cannot be analyzed in an isolated manner, the optimal feature combination is screened as the input of the random forest in the candidate features screened in the step 2.1 by adopting a binary feature combination method, and the step 2.2 specifically comprises the following steps:
step 2.2.1, use binary coding to distinguish the use state of the alternative features, i.e. used or abandoned, and screen out the binary feature data set
Setting a total of n candidate features to be screened, wherein the feature data sets and the corresponding binary codes thereof are respectively expressed as:
X=[x1,x2,x3,...,xn]
W=[w1,w2,w3,...,wn]
wherein n is the number of the alternative features,
x of the ith featureiCorresponding binary code is wi,wiThere are two states, 0 and 1. When w isiWhen 0, this feature is not used; when w isiWhen 1, this feature is used. Thus, the screened binary feature data setCan be expressed as:
step 2.2.2As the input of the genetic algorithm, the optimal feature combination is searched based on the genetic algorithm,
the optimization diagram of genetic algorithm is shown in FIG. 2, and the genetic algorithm pair is usedWhen performing the optimizing search, it is necessary to searchAs the input of genetic algorithm, the genetic algorithm is initialized randomly, and multiple groups of [ w ] are generated randomly1,w2,w3,...,wn](ii) an individual. These individuals constitute an initialized population P (0), assuming that the sequence of binary strings l is [ w ]1,w2,w3,...,wn]In the form of binary codes of (a), these binary string sequences l are called chromosomes in the genetic algorithm, and [ w1,w2,w3,...,wn]Representative is an individual. Then decoding the coded chromosome to obtain the weight parameter to be optimized carried in the individual, inputting the optimization parameter into the fitness function, adopting the fitness function as the loss function value after SVR training, judging whether the iteration number is reached, if not, selecting two individuals from the population according to the fitness value to copy, and judging the value of the cross probability in the genetic algorithmWhether the two individuals selected next need to do the crossover operation. After the crossover operation, a mutation operation is performed. The mutation operation needs to judge whether the two current temporary individuals need to execute the mutation operation according to the set mutation probability. In general, in genetic algorithms, the magnitude of the mutation probability is calculated by the following formula,
pm=0.8/d
in the formula (I), the compound is shown in the specification,
d represents the number of binary symbols after individual binary encoding.
And optimizing the characteristic combination of the peak-valley difference prediction by adopting a binary characteristic combination method for each type of cell. The table below lists the first 3 combinations of features for the three cell datasets. Wherein Top3 indicates that the characteristic combination scheme of Top three is selected, case1, case2 and case3 represent the characteristic combination scheme of the Top three load peak-valley difference and each influence factor of the hierarchical cell. Where case1 is the first feature combination method, Top1, Top1 feature combination was chosen as the optimal feature combination input in the experiment. Check represents selection of the feature, and x represents deletion of the feature. The combination of peak-to-valley difference and characteristics of each influencing factor is shown in table 2,
table 2: characteristic combination scheme of peak-valley difference and various influence factors
And 3, training the training data of the optimal characteristic combination selected in the step 2 by using a random forest algorithm to obtain a load peak valley difference measuring and calculating model of the user at the demand side, and outputting a one-time medium-term and long-term prediction result of the moon peak valley difference and the season peak valley difference. As shown in fig. 3.
The step 3 specifically comprises the following steps:
step 3.1, constructing a random forest model, taking a random forest as a basis for one-time medium and long term measurement and calculation,
the steps of constructing the measuring model by using the random forest algorithm are as follows:
step 3.1.1, setting a data set with optimal combination characteristics in the previous N years as an original sample, sampling the original sample by using a bootstrap method, generating K data sets as a training set of a decision tree, wherein N is a positive integer and is less than the set number of years in the step 1, and the data set in the previous five years is adopted as the original sample in the embodiment;
step 3.1.2, if M input variables originally exist, each node randomly selects M specific variables, and determines the optimal classification point according to the M specific variables, wherein M is less than M;
step 3.1.3, each decision tree is grown to the maximum possible without pruning;
and 3.1.4, taking the average value of all decision trees as a predicted value.
Step 3.2, aiming at the historical load data of the user at the demand side, calculating the load natural growth rate monthly and quarterly according to the time scale measured and calculated by the peak-valley difference based on a trend extrapolation method; the load natural growth rate is an influence factor representing the load change characteristic and is used as one of the inputs of the load peak-valley difference measurement model of the user at the demand side.
3.3, forming a data-driven training sample based on historical average peak-valley difference acquired month by month and season by season, the natural load increase rate and the screened peak-valley difference calculating influence factors; constructing a peak-valley difference measurement model based on random forests and training; the trained model can output the result of the medium-term and long-term measurement and calculation of the load peak-valley difference of the user at the demand side. And finally, outputting the measurement results of the difference between the moon peak and the valley and the difference between the season peak and the valley.
The prediction result of the method based on the random forest is shown in table 3, and a prediction method based on a Support Vector Machine (SVM), a multilayer perceptron (MLP) and a Gaussian regression process GPR is selected as a reference prediction method for verifying the effectiveness of the method provided by the invention in improving the prediction precision. From tables 3 and 4, the prediction accuracy of the random forest can be seen, all error indexes of other models are compared comprehensively, and the prediction effect of the monthly peak-valley difference and the seasonal peak-valley difference of the random forest is the best.
Table 3: moon peak valley difference prediction result based on different models
Table 4: prediction result of difference between peaks and valleys based on different models
And 4, selecting the screened related correction factors one by one as input based on the historically collected peak-valley difference data of the user load on the demand side, constructing a Bayesian regression model so as to realize fitting modeling of the peak-valley difference of the user load, and correcting the medium-long term prediction result of the primary quaternary peak-valley difference according to the fitting result.
Due to the fact that the time span of the quaternary peak-valley difference is large, influence of influence factor difference between years on the quaternary peak-valley difference is considered, Bayesian ridge regression calculation is built, fitting characteristics of relevant correction factors and load peak-valley difference are obtained, and the medium-long term prediction result of the primary quaternary peak-valley difference is corrected. The second correction process is shown in fig. 4. The secondary peak-valley difference correction stage comprises the following steps:
step 4.1, constructing a Bayesian ridge regression model based on the historical load peak-valley difference of the user at the demand side and the screened related correction factors, wherein the specific calculation process in the Bayesian ridge regression model is as follows:
wherein the content of the first and second substances,
p (w | a, b) is the parameter w distribution probability for conditional features a and b;
a loss function which is a ridge regression consisting in solving so that the ith output yiAnd input xijFamily of parameters between which the fitting error is minimal { beta }j};
And 4.2, respectively establishing fitting relations between the difference of the quaternary peaks and the valley and population correction factors and resident consumption level correction factors based on a Bayesian ridge regression model, and obtaining population correction fitting curves and resident consumption level fitting curves as shown in FIGS. 6 and 7.
Step 4.3, sequentially calculating correction coefficients based on the two fitting curves, and correcting the peak-valley difference of the load of the demand side user in the measurement result of the peak-valley difference of the load of the demand side user obtained in the step 3 to obtain a correction result of the peak-valley difference of the load of the demand side user;
correction factor D*The calculation method of (c) is as follows:
in the formula (I), the compound is shown in the specification,
d is the measured peak-valley difference of a certain quarter of the nth year,
d is the corrected difference of peak and valley in a certain quarter of the nth year,
dn-1to fit the difference between the peak and valley of the season for a certain quarter of year n-1 on the curve,
dnthe difference between the peak and valley of the season for a certain quarter of the nth year on the fitted curve.
Taking population correction of the peak-valley difference of the spring of 2018 as an example, knowing population data of 2017 and 2018, finding corresponding peak-valley difference values d2017 and d2018 on a fitting curve, calculating a correction coefficient and correcting the peak-valley difference of the spring of 2018:
wherein D is the peak-to-valley difference in 2018 spring measured,
d is the corrected peak-to-valley difference in 2018 spring,
d2017the peak-to-valley difference value in the spring of 2017,
d2018the peak-to-valley difference value in spring of 2018.
Comparing the predicted value and the true value of the load at the moment to be predicted, and calculating error indexes MAPE, MAE and RMSE according to the following formulas:
wherein the content of the first and second substances,
leis the true value of the load at a certain moment,
ntestthe number of test samples;
in addition, the peak-valley difference medium-and-long-term measurement method based on the random forest and the secondary correction, as shown in table 5, is a prediction result of the quaternary peak-valley difference after the secondary correction.
The result shows that a certain deviation exists between the primary predicted value and the true value, but the predicted result after secondary correction has more obvious performance improvement compared with a first stage. The model after secondary correction has better prediction effect. By comparing the evaluation index results, the RMSE index, the MAPE index and the MAE index of the prediction result after secondary correction are all reduced compared with the primary prediction, and the prediction precision is effectively improved. This shows that the peak-valley difference twice prediction model improves the sensitivity of the model to the difference of the influence factors of different years to a certain extent, thereby further improving the accuracy of the final prediction result.
Fig. 5 is a primary medium-and-long term monthly peak-to-valley difference prediction result, and fig. 6 is a primary medium-and-long term seasonal peak-to-valley difference prediction result and a secondary corrected seasonal peak-to-valley difference prediction effect. The adopted secondary peak-valley difference prediction model not only considers the influence factors of the peak-valley difference from month to month and from quarter to quarter, but also considers the difference of the influence factors between years. The peak-valley difference is predicted through two stages of primary medium-long term prediction and secondary correction, so that the prediction precision is greatly improved. Finally, the superiority of the method provided by the research is proved by example analysis.
Table 5: two comparisons of predicted peak-to-valley difference
In conclusion, the prediction method can be used for predicting the load peak-valley difference on the demand side, and plays an important guiding role in power system scheduling, energy management and demand response implementation. Compared with other prediction methods, the method provided by the invention utilizes the annual influence factor difference to carry out secondary correction, so that the prediction precision is obviously improved, the load peak-valley difference of the resident user can be more accurately predicted, and the method has important significance for promoting the response development of the demand side and relieving the power supply and demand contradiction.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.
Claims (10)
1. The peak-valley difference medium-long term measurement and calculation method based on random forest and secondary correction is characterized by comprising the following steps of:
step 1, collecting historical electricity load data of a plurality of residential user areas within a set number of years, calculating historical peak-valley differences, collecting influence factor data influencing the load peak-valley differences, and taking related influence factors as alternative characteristics;
step 2, extracting the characteristics of the multi-source influence factors, extracting an optimal characteristic combination by adopting binary characteristic engineering, and taking the characteristic combination as the input of the random forest model in the step 3;
step 3, training the training data of the optimal characteristic combination selected in the step 2 by using a random forest algorithm to obtain a load peak valley difference measuring and calculating model of the user at the demand side, and outputting a primary medium-long term prediction result of the moon peak valley difference and the season peak valley difference;
and 4, selecting the screened related correction factors one by one as input based on the historically collected peak-valley difference data of the user load on the demand side, constructing a Bayesian regression model so as to realize fitting modeling of the peak-valley difference of the user load, and correcting the medium-long term prediction result of the primary quaternary peak-valley difference according to the fitting result.
2. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 1,
the historical load data of the electricity consumption in the step 1 comprise daily maximum load data and daily minimum load data; the historical peak-valley difference comprises a daily peak-valley difference, a monthly peak-valley difference and a seasonal peak-valley difference; the influencing factor characteristics include: daily maximum air temperature, daily minimum air temperature, daily average air temperature, air pressure, humidity, rainfall, wind speed, daily average load.
3. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 1,
the step 2 specifically comprises:
step 2.1, calculating the correlation degree between the candidate features and the peak-valley difference in the step 1, and screening n candidate features from high to low according to the correlation degree;
and 2.2, screening the optimal characteristic combination as the input of measurement and calculation by adopting a binary characteristic combination method.
4. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 2,
the step 2.2 specifically comprises:
step 2.2.1, use binary coding to distinguish the use state of the alternative features, i.e. used or abandoned, and screen out the binary feature data set
5. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 4,
wherein n is the number of the alternative features,
x of the ith featureiCorresponding binary code is wi,wiHas two states of 0 and 1, when wiWhen 0, this feature is not used; when w isiWhen 1, this feature is used.
6. The method for calculating the peak-to-valley difference based on the random forest and the secondary correction as claimed in claim 1, wherein:
the step 3 specifically includes:
step 3.1, taking the random forest as a basis for one-time medium and long-term measurement and calculation;
step 3.2, aiming at the historical load data of the user at the demand side, calculating the load natural growth rate monthly and quarterly according to the time scale measured and calculated by the peak-valley difference based on a trend extrapolation method;
3.3, forming a data-driven training sample based on historical average peak-valley difference acquired month by month and season by season, the natural load increase rate and the screened peak-valley difference calculating influence factors; constructing a peak-valley difference measurement model based on random forests and training; the trained model can realize the output of the result of the medium-term and long-term measurement and calculation of the load peak-valley difference of the user at the demand side.
7. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 4,
step 3.1 specifically comprises:
step 3.1.1, setting a data set with optimal combination characteristics in the last N years as an original sample, sampling the original sample by using a bootstrap method, generating K data sets as a training set of a decision tree, wherein N is a positive integer and is less than the set number of years in the step 1;
step 3.1.2, if M input variables originally exist, each node randomly selects M specific variables, and determines the optimal classification point according to the M specific variables, wherein M is less than M;
step 3.1.3, each decision tree is grown to the maximum possible without pruning;
and 3.1.4, taking the average value of all decision trees as a predicted value.
8. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 1,
the step 4 specifically comprises the following steps:
step 4.1, constructing a Bayesian ridge regression model based on the historical load peak-valley difference of the user at the demand side and the screened related correction factors;
step 4.2, based on a Bayesian ridge regression model, respectively establishing fitting relations between the difference of the quaternary peak valley and population correction factors and between the difference of the quaternary peak valley and the population correction factors and between the difference of the residential consumption levels and obtaining population correction fitting curves and residential consumption level fitting curves;
and 4.3, calculating a correction coefficient based on the two fitting curves, and correcting the peak-valley difference of the load of the demand side user in the measurement result of the peak-valley difference of the load of the demand side user obtained in the step 3 in sequence to obtain a correction result of the peak-valley difference of the load of the demand side user.
9. The method for mid-to-long term measurement of peak-to-valley difference based on random forest and quadratic correction as claimed in claim 8,
the specific calculation process in the Bayesian ridge regression model is as follows:
wherein the content of the first and second substances,
p (w | a, b) is the parameter w distribution probability for conditional features a and b;
a loss function which is a ridge regression consisting in solving so that the ith output yiAnd input xijFamily of parameters between which the fitting error is minimal { beta }j};
10. The method for mid-to-long term measurement of peak-to-valley difference based on random forest and quadratic correction as claimed in claim 9,
step 4.3 correction factor D*The calculation method of (c) is as follows:
in the formula (I), the compound is shown in the specification,
d is the measured peak-valley difference of a certain quarter of the nth year,
d is the corrected difference of peak and valley in a certain quarter of the nth year,
dn-1to fit the difference between the peak and valley of the season for a certain quarter of year n-1 on the curve,
dnthe difference between the peak and valley of the season for a certain quarter of the nth year on the fitted curve.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111210827.1A CN113869600A (en) | 2021-10-18 | 2021-10-18 | Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111210827.1A CN113869600A (en) | 2021-10-18 | 2021-10-18 | Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113869600A true CN113869600A (en) | 2021-12-31 |
Family
ID=79000065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111210827.1A Pending CN113869600A (en) | 2021-10-18 | 2021-10-18 | Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113869600A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104091293A (en) * | 2014-07-30 | 2014-10-08 | 国家电网公司 | ower grid long-term load characteristic predication method based on variation of electricity consumption structure |
CN104134169A (en) * | 2014-08-19 | 2014-11-05 | 国家电网公司 | Calculation method based on peak and valley period electrical load weights |
CN104200277A (en) * | 2014-08-12 | 2014-12-10 | 南方电网科学研究院有限责任公司 | Modeling method for medium and long term power load forecasting |
CN105373795A (en) * | 2015-09-18 | 2016-03-02 | 中国科学院计算技术研究所 | A binary image feature extraction method and system |
CN106980910A (en) * | 2017-04-01 | 2017-07-25 | 国网宁夏电力公司经济技术研究院 | Long Electric Power Load calculating system and method |
CN108022001A (en) * | 2017-09-20 | 2018-05-11 | 河海大学 | Short term probability density Forecasting Methodology based on PCA and quantile estimate forest |
CN109426901A (en) * | 2017-08-25 | 2019-03-05 | 中国电力科学研究院 | Long-term power consumption prediction method and device in one kind |
-
2021
- 2021-10-18 CN CN202111210827.1A patent/CN113869600A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104091293A (en) * | 2014-07-30 | 2014-10-08 | 国家电网公司 | ower grid long-term load characteristic predication method based on variation of electricity consumption structure |
CN104200277A (en) * | 2014-08-12 | 2014-12-10 | 南方电网科学研究院有限责任公司 | Modeling method for medium and long term power load forecasting |
CN104134169A (en) * | 2014-08-19 | 2014-11-05 | 国家电网公司 | Calculation method based on peak and valley period electrical load weights |
CN105373795A (en) * | 2015-09-18 | 2016-03-02 | 中国科学院计算技术研究所 | A binary image feature extraction method and system |
CN106980910A (en) * | 2017-04-01 | 2017-07-25 | 国网宁夏电力公司经济技术研究院 | Long Electric Power Load calculating system and method |
CN109426901A (en) * | 2017-08-25 | 2019-03-05 | 中国电力科学研究院 | Long-term power consumption prediction method and device in one kind |
CN108022001A (en) * | 2017-09-20 | 2018-05-11 | 河海大学 | Short term probability density Forecasting Methodology based on PCA and quantile estimate forest |
Non-Patent Citations (1)
Title |
---|
李伶杰;王银堂;胡庆芳;刘勇;刘定忠;崔婷婷;: "基于时变权重组合与贝叶斯修正的中长期径流预报", 地理科学进展, no. 04, 28 April 2020 (2020-04-28), pages 643 - 650 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263866B (en) | Power consumer load interval prediction method based on deep learning | |
CN113962364B (en) | Multi-factor power load prediction method based on deep learning | |
CN112633604B (en) | Short-term power consumption prediction method based on I-LSTM | |
CN110969290B (en) | Runoff probability prediction method and system based on deep learning | |
CN111080032A (en) | Load prediction method based on Transformer structure | |
CN113537600B (en) | Medium-long-term precipitation prediction modeling method for whole-process coupling machine learning | |
CN113705877B (en) | Real-time moon runoff forecasting method based on deep learning model | |
CN110570030A (en) | Wind power cluster power interval prediction method and system based on deep learning | |
CN111985719B (en) | Power load prediction method based on improved long-term and short-term memory network | |
CN112100911B (en) | Solar radiation prediction method based on depth BILSTM | |
CN116976529B (en) | Cross-river-basin water diversion method and system based on supply-demand prediction dynamic correction | |
CN113554466A (en) | Short-term power consumption prediction model construction method, prediction method and device | |
CN115495991A (en) | Rainfall interval prediction method based on time convolution network | |
CN114169434A (en) | Load prediction method | |
CN115204444A (en) | Photovoltaic power prediction method based on improved cluster analysis and fusion integration algorithm | |
CN116842337A (en) | Transformer fault diagnosis method based on LightGBM (gallium nitride based) optimal characteristics and COA-CNN (chip on board) model | |
Siddiqi et al. | Genetic algorithm for the mutual information-based feature selection in univariate time series data | |
CN116703644A (en) | Attention-RNN-based short-term power load prediction method | |
CN114117852B (en) | Regional heat load rolling prediction method based on finite difference working domain division | |
CN113762591A (en) | Short-term electric quantity prediction method and system based on GRU and multi-core SVM counterstudy | |
CN115481841A (en) | Material demand prediction method based on feature extraction and improved random forest | |
CN116561569A (en) | Industrial power load identification method based on EO feature selection and AdaBoost algorithm | |
CN111310974A (en) | Short-term water demand prediction method based on GA-ELM | |
CN113869600A (en) | Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction | |
CN114091782B (en) | Medium-long term power load prediction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |