CN113869600A - Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction - Google Patents

Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction Download PDF

Info

Publication number
CN113869600A
CN113869600A CN202111210827.1A CN202111210827A CN113869600A CN 113869600 A CN113869600 A CN 113869600A CN 202111210827 A CN202111210827 A CN 202111210827A CN 113869600 A CN113869600 A CN 113869600A
Authority
CN
China
Prior art keywords
peak
valley difference
valley
load
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111210827.1A
Other languages
Chinese (zh)
Inventor
黄奇峰
方凯杰
左强
杨世海
赵梓舒
黄艺璇
刘恬畅
程含渺
陈铭明
李波
陆婋泉
曹晓冬
徐雨森
臧海祥
孙国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Hohai University HHU
Original Assignee
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co ltd Marketing Service Center, Hohai University HHU filed Critical State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority to CN202111210827.1A priority Critical patent/CN113869600A/en
Publication of CN113869600A publication Critical patent/CN113869600A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a random forest and secondary correction-based peak-valley difference medium-and-long term measurement model, which is used for evaluating medium-and-long term demand response implementation effect, collecting historical load data of a plurality of residential users and calculating historical peak-valley differences, and analyzing multi-source influence factors of the load peak-valley differences of users on demand sides; extracting the characteristics of the multi-source influence factors, and extracting the optimal characteristic combination by adopting binary characteristic engineering as the input of a random forest model; constructing a random forest-based peak-valley difference measurement model, and outputting measurement results of the monthly peak-valley difference and the seasonal peak-valley difference; based on historically collected load peak-valley difference data of users on the demand side, the screened related correction factors are selected one by one as input, a Bayesian regression model is constructed so as to realize fitting modeling of the load peak-valley difference of the users, and the medium-long term prediction result of the one-time seasonal peak-valley difference is corrected according to the fitting result. The invention has important significance for promoting the response development of the demand side and relieving the contradiction between power supply and demand.

Description

Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction
Technical Field
The invention belongs to the technical field of power systems, and relates to a peak-valley difference medium-long term prediction method based on random forest and secondary correction
Background
Under the background of 'carbon peak reaching and carbon neutralization', demand response facing flexible interactive intelligent power utilization becomes a development trend. The resident load is used as an important component of a demand response user, peak clipping and valley filling can be effectively realized, and the reliable and stable operation of the power system is promoted. However, the load characteristics of the demand-side user are affected by various factors such as weather conditions, population growth, and economic development, and it is difficult to perform effective medium-and long-term demand-side evaluation, which affects the reliability of the medium-and long-term demand-side response implementation evaluation. Therefore, the research on how to accurately predict the middle-long-term load peak-valley difference has important significance for promoting the response development of the demand side and relieving the contradiction between power supply and demand.
The work of using peak-to-valley difference prediction as a research focus is very limited. From the existing prediction methods, the prediction methods can be classified into deep learning, statistical models, and machine learning models. The traditional statistical prediction method is easy to implement, and extra input does not need to be acquired. However, in many cases, the accuracy is often limited because only historical data is considered. The deep learning method has good prediction performance and is widely concerned in recent years, but has good prediction accuracy due to the periodicity and the discontinuity of the moon peak valley difference and the quaternary peak valley difference, but is more suitable for continuous time series prediction. The traditional machine learning method is high in calculation speed and strong in generalization capability. The machine learning method comprises a support vector machine, a random forest and the like.
The support vector machine can improve the generalization capability of the learning machine as much as possible, the calculation speed is high, when the binary feature combination optimization is carried out by utilizing the genetic algorithm, the support vector machine can be adopted to carry out peak-valley difference prediction, and the fitness function is set as a loss function value between the prediction after the training of the support vector machine and an actual value. The random forest model is basically a bagging method, robustness is provided for overfitting, and the performance of weak learners (decision trees) is improved through a voting method or an averaging method according to a prediction result. The method has the advantages of difficult overfitting, strong anti-noise capability, high calculation speed and high prediction precision.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a peak-valley difference medium-long term prediction method based on random forests and secondary correction, the prediction precision is improved based on the secondary correction, the load peak-valley difference prediction accuracy is improved by using the load side peak-valley difference characteristics, and more reliable guidance is provided for the operation and the scheduling of a power system.
The invention adopts the following technical scheme. The invention provides a method for measuring and calculating peak-valley difference in medium and long periods based on random forests and secondary correction, which comprises the following steps of:
step 1, collecting historical electricity load data of a plurality of residential user areas within a set number of years, calculating historical peak-valley differences, collecting influence factor data influencing the load peak-valley differences, and taking related influence factors as alternative characteristics;
step 2, extracting the characteristics of the multi-source influence factors, extracting an optimal characteristic combination by adopting binary characteristic engineering, and taking the characteristic combination as the input of the random forest model in the step 3;
step 3, training the training data of the optimal characteristic combination selected in the step 2 by using a random forest algorithm to obtain a load peak valley difference measuring and calculating model of the user at the demand side, and outputting a primary medium-long term prediction result of the moon peak valley difference and the season peak valley difference;
and 4, selecting the screened related correction factors one by one as input based on the historically collected peak-valley difference data of the user load on the demand side, constructing a Bayesian regression model so as to realize fitting modeling of the peak-valley difference of the user load, and correcting the medium-long term prediction result of the primary quaternary peak-valley difference according to the fitting result.
Preferably, the historical load data of electricity consumption in step 1 includes daily maximum load data and daily minimum load data; the historical peak-valley difference comprises a daily peak-valley difference, a monthly peak-valley difference and a seasonal peak-valley difference; the influencing factor characteristics include: daily maximum air temperature, daily minimum air temperature, daily average air temperature, air pressure, humidity, rainfall, wind speed, daily average load.
Preferably, step 2 specifically comprises:
step 2.1, calculating the correlation degree between the candidate features and the peak-valley difference in the step 1, and screening n candidate features from high to low according to the correlation degree;
and 2.2, screening the optimal characteristic combination as the input of measurement and calculation by adopting a binary characteristic combination method.
Preferably, step 2.2 specifically comprises:
step 2.2.1, use binary coding to distinguish the use state of the alternative features, i.e. used or abandoned, and screen out the binary feature data set
Figure BDA0003308761190000021
Step 2.2.2, binary characteristic data set screened out
Figure BDA0003308761190000022
And as an input of the genetic algorithm, searching for an optimal feature combination based on the genetic algorithm.
Preferably, the binary feature data set
Figure BDA0003308761190000031
Can be expressed as:
Figure BDA0003308761190000032
wherein n is the number of the alternative features,
x of the ith featureiCorresponding binary code is wi,wiHas two states of 0 and 1, when wiWhen 0, this feature is not used; when w isiWhen 1, this feature is used.
Preferably, step 3 specifically includes:
step 3.1, taking the random forest as a basis for one-time medium and long-term measurement and calculation;
step 3.2, aiming at the historical load data of the user at the demand side, calculating the load natural growth rate monthly and quarterly according to the time scale measured and calculated by the peak-valley difference based on a trend extrapolation method;
3.3, forming a data-driven training sample based on historical average peak-valley difference acquired month by month and season by season, the natural load increase rate and the screened peak-valley difference calculating influence factors; constructing a peak-valley difference measurement model based on random forests and training; the trained model can output the result of the medium-term and long-term measurement and calculation of the load peak-valley difference of the user at the demand side.
Preferably, step 3.1 specifically comprises:
step 3.1.1, setting a data set with optimal combination characteristics in the last N years as an original sample, sampling the original sample by using a bootstrap method, generating K data sets as a training set of a decision tree, wherein N is a positive integer and is less than the set number of years in the step 1;
step 3.1.2, if M input variables originally exist, each node randomly selects M specific variables, and determines the optimal classification point according to the M specific variables, wherein M is less than M;
step 3.1.3, each decision tree is grown to the maximum possible without pruning;
and 3.1.4, taking the average value of all decision trees as a predicted value.
Preferably, step 4 specifically includes:
step 4.1, constructing a Bayesian ridge regression model based on the historical load peak-valley difference of the user at the demand side and the screened related correction factors;
step 4.2, based on the Bayesian ridge regression model, respectively establishing fitting relations between the difference of the quaternary peak and the valley and population correction factors and resident consumption level correction factors, and obtaining population correction fitting curves and resident consumption level fitting curves,
and 4.3, calculating a correction coefficient based on the two fitting curves, and correcting the peak-valley difference of the load of the demand side user in the measurement result of the peak-valley difference of the load of the demand side user obtained in the step 3 in sequence to obtain a correction result of the peak-valley difference of the load of the demand side user.
Preferably, the specific calculation process in the bayesian ridge regression model is as follows:
Figure BDA0003308761190000041
Figure BDA0003308761190000042
wherein the content of the first and second substances,
p (w | a, b) is the parameter w distribution probability for conditional features a and b;
Figure BDA0003308761190000043
a loss function which is a ridge regression consisting in solving so that the ith output yiAnd input xijFamily of parameters between which the fitting error is minimal { beta }j};
Figure BDA0003308761190000044
Penalty loss regular as L2.
Preferably, step 4.3 is to use the correction factor D*The calculation method of (c) is as follows:
Figure BDA0003308761190000045
in the formula (I), the compound is shown in the specification,
d is the measured peak-valley difference of a certain quarter of the nth year,
d is the corrected difference of peak and valley in a certain quarter of the nth year,
dn-1to fit the difference between the peak and valley of the season for a certain quarter of year n-1 on the curve,
dnthe difference between the peak and valley of the season for a certain quarter of the nth year on the fitted curve.
Compared with the prior art, the method has the advantages that the method is based on the random forest and secondary correction, the peak-valley difference medium-and-long term measurement model prediction method is provided, and the prediction precision is improved based on the secondary correction, aiming at the problems that the load change of a user on the demand side is flexible day by day, the implementation effect of the response on the demand side is difficult to evaluate accurately, and the reliability of the implementation and evaluation of the response on the medium-and-long term demand side depends on the accurate load peak-valley difference prediction.
Drawings
FIG. 1 is a schematic flow chart of a peak-to-valley difference prediction method based on random forest and secondary correction according to the present invention;
FIG. 2 is a schematic diagram of the binary feature combination optimization process of the present invention;
FIG. 3 is a flow chart of one-time medium-long term prediction based on random forests according to the present invention;
FIG. 4 is a flow chart of a quadratic correction based on Bayesian ridge regression according to the present invention;
FIG. 5 is a graph showing a peak-to-valley difference prediction result of a medium-to-long term month using the proposed method according to an embodiment of the present invention;
FIG. 6 is a diagram of the prediction results of the first-order medium-and long-term peak-valley difference and the prediction effect of the second-order corrected peak-valley difference according to the embodiment of the present invention;
FIG. 7 is a population correction fit curve in an embodiment of the present invention;
FIG. 8 is a graph of a fitted residential consumption level curve according to the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
In the invention, the medium-long term prediction method refers to a prediction method with the prediction time between one month and one year, and the prediction method can be divided into an ultra-short term (0-6h), a short term (6h-1d) and a medium-long term (1 month-1 year) prediction method according to different prediction time scales.
As shown in FIG. 1, the invention provides a peak-to-valley difference medium-and-long term estimation method based on random forests and secondary correction, which comprises the following steps:
step 1, collecting historical load data of a plurality of residential users within a set number of years, calculating historical peak-valley difference, and collecting influence factor data influencing the load peak-valley difference as alternative characteristics;
in the embodiment, the monthly peak-valley difference and the quarterly peak-valley difference of three residential areas of high, medium and low grades in a certain city in Jiangsu province in China are taken as research objects, historical data of electricity loads of six years from 2013 to 2018 are collected, the data collection time resolution is 15min, 96 data are collected every day, wherein the historical data comprise daily maximum load data and daily minimum load data, the daily peak-valley difference is obtained through calculation according to the daily maximum load data and the daily minimum load data, the monthly peak-valley difference and the quarterly peak-valley difference are further obtained through calculation, and daily average load data are calculated according to the collected 96 daily electricity load historical data. Further, influence factors such as daily maximum air temperature, daily minimum air temperature, daily average air temperature, air pressure, humidity, rainfall, wind speed, daily average load, and the like are considered as candidate characteristics. Population factors and GDP annual influence factors are used as secondary correction factors.
The former five-year data is used as training data, and the last-year data is used as test data to verify the validity of data prediction. The prediction error evaluation indexes are the average absolute percentage error MAPE, the average absolute error MAE and the root mean square error RMSE.
Step 2, extracting the characteristics of the multi-source influence factors, and extracting an optimal characteristic combination by adopting binary characteristic engineering to serve as the input of a random forest model;
the step 2 specifically comprises the following steps:
step 2.1, calculating the correlation degree between the candidate features and the peak-valley difference in the step 1, screening n candidate features from high to low according to the correlation degree,
in order to improve the prediction accuracy, the characteristic factors which have significant correlation with the peak-valley difference of the load on the resident side are screened out in a lean mode by considering multiple influence factors, and a preferable but non-limiting implementation mode is that the correlation degree between the candidate characteristics and the peak-valley difference is quantitatively analyzed by adopting a Persons coefficient. The formula for the Persons coefficient is as follows:
Figure BDA0003308761190000061
wherein the content of the first and second substances,
x is the difference between the peak and the valley,
y is any influencing factor;
the degree of correlation between the candidate features and the peak-to-valley difference was quantitatively analyzed using a Persons coefficient, as listed in table 1 for the degree of correlation of the candidate features as a combination of binary features,
table 1: degree of correlation between peak-to-valley difference and each influence factor
Figure BDA0003308761190000062
Figure BDA0003308761190000071
And 2.2, screening the optimal characteristic combination as the input of measurement and calculation by adopting a binary characteristic combination method.
Because the influence of the related influence factors on the peak-valley difference is combined and the effect of a single factor on the peak-valley difference cannot be analyzed in an isolated manner, the optimal feature combination is screened as the input of the random forest in the candidate features screened in the step 2.1 by adopting a binary feature combination method, and the step 2.2 specifically comprises the following steps:
step 2.2.1, use binary coding to distinguish the use state of the alternative features, i.e. used or abandoned, and screen out the binary feature data set
Figure BDA0003308761190000072
Setting a total of n candidate features to be screened, wherein the feature data sets and the corresponding binary codes thereof are respectively expressed as:
X=[x1,x2,x3,...,xn]
W=[w1,w2,w3,...,wn]
wherein n is the number of the alternative features,
x of the ith featureiCorresponding binary code is wi,wiThere are two states, 0 and 1. When w isiWhen 0, this feature is not used; when w isiWhen 1, this feature is used. Thus, the screened binary feature data set
Figure BDA0003308761190000073
Can be expressed as:
Figure BDA0003308761190000074
step 2.2.2
Figure BDA0003308761190000075
As the input of the genetic algorithm, the optimal feature combination is searched based on the genetic algorithm,
the optimization diagram of genetic algorithm is shown in FIG. 2, and the genetic algorithm pair is used
Figure BDA0003308761190000076
When performing the optimizing search, it is necessary to search
Figure BDA0003308761190000077
As the input of genetic algorithm, the genetic algorithm is initialized randomly, and multiple groups of [ w ] are generated randomly1,w2,w3,...,wn](ii) an individual. These individuals constitute an initialized population P (0), assuming that the sequence of binary strings l is [ w ]1,w2,w3,...,wn]In the form of binary codes of (a), these binary string sequences l are called chromosomes in the genetic algorithm, and [ w1,w2,w3,...,wn]Representative is an individual. Then decoding the coded chromosome to obtain the weight parameter to be optimized carried in the individual, inputting the optimization parameter into the fitness function, adopting the fitness function as the loss function value after SVR training, judging whether the iteration number is reached, if not, selecting two individuals from the population according to the fitness value to copy, and judging the value of the cross probability in the genetic algorithmWhether the two individuals selected next need to do the crossover operation. After the crossover operation, a mutation operation is performed. The mutation operation needs to judge whether the two current temporary individuals need to execute the mutation operation according to the set mutation probability. In general, in genetic algorithms, the magnitude of the mutation probability is calculated by the following formula,
pm=0.8/d
in the formula (I), the compound is shown in the specification,
d represents the number of binary symbols after individual binary encoding.
And optimizing the characteristic combination of the peak-valley difference prediction by adopting a binary characteristic combination method for each type of cell. The table below lists the first 3 combinations of features for the three cell datasets. Wherein Top3 indicates that the characteristic combination scheme of Top three is selected, case1, case2 and case3 represent the characteristic combination scheme of the Top three load peak-valley difference and each influence factor of the hierarchical cell. Where case1 is the first feature combination method, Top1, Top1 feature combination was chosen as the optimal feature combination input in the experiment. Check represents selection of the feature, and x represents deletion of the feature. The combination of peak-to-valley difference and characteristics of each influencing factor is shown in table 2,
table 2: characteristic combination scheme of peak-valley difference and various influence factors
Figure BDA0003308761190000081
Figure BDA0003308761190000091
And 3, training the training data of the optimal characteristic combination selected in the step 2 by using a random forest algorithm to obtain a load peak valley difference measuring and calculating model of the user at the demand side, and outputting a one-time medium-term and long-term prediction result of the moon peak valley difference and the season peak valley difference. As shown in fig. 3.
The step 3 specifically comprises the following steps:
step 3.1, constructing a random forest model, taking a random forest as a basis for one-time medium and long term measurement and calculation,
the steps of constructing the measuring model by using the random forest algorithm are as follows:
step 3.1.1, setting a data set with optimal combination characteristics in the previous N years as an original sample, sampling the original sample by using a bootstrap method, generating K data sets as a training set of a decision tree, wherein N is a positive integer and is less than the set number of years in the step 1, and the data set in the previous five years is adopted as the original sample in the embodiment;
step 3.1.2, if M input variables originally exist, each node randomly selects M specific variables, and determines the optimal classification point according to the M specific variables, wherein M is less than M;
step 3.1.3, each decision tree is grown to the maximum possible without pruning;
and 3.1.4, taking the average value of all decision trees as a predicted value.
Step 3.2, aiming at the historical load data of the user at the demand side, calculating the load natural growth rate monthly and quarterly according to the time scale measured and calculated by the peak-valley difference based on a trend extrapolation method; the load natural growth rate is an influence factor representing the load change characteristic and is used as one of the inputs of the load peak-valley difference measurement model of the user at the demand side.
3.3, forming a data-driven training sample based on historical average peak-valley difference acquired month by month and season by season, the natural load increase rate and the screened peak-valley difference calculating influence factors; constructing a peak-valley difference measurement model based on random forests and training; the trained model can output the result of the medium-term and long-term measurement and calculation of the load peak-valley difference of the user at the demand side. And finally, outputting the measurement results of the difference between the moon peak and the valley and the difference between the season peak and the valley.
The prediction result of the method based on the random forest is shown in table 3, and a prediction method based on a Support Vector Machine (SVM), a multilayer perceptron (MLP) and a Gaussian regression process GPR is selected as a reference prediction method for verifying the effectiveness of the method provided by the invention in improving the prediction precision. From tables 3 and 4, the prediction accuracy of the random forest can be seen, all error indexes of other models are compared comprehensively, and the prediction effect of the monthly peak-valley difference and the seasonal peak-valley difference of the random forest is the best.
Table 3: moon peak valley difference prediction result based on different models
Figure BDA0003308761190000101
Table 4: prediction result of difference between peaks and valleys based on different models
Figure BDA0003308761190000102
Figure BDA0003308761190000111
And 4, selecting the screened related correction factors one by one as input based on the historically collected peak-valley difference data of the user load on the demand side, constructing a Bayesian regression model so as to realize fitting modeling of the peak-valley difference of the user load, and correcting the medium-long term prediction result of the primary quaternary peak-valley difference according to the fitting result.
Due to the fact that the time span of the quaternary peak-valley difference is large, influence of influence factor difference between years on the quaternary peak-valley difference is considered, Bayesian ridge regression calculation is built, fitting characteristics of relevant correction factors and load peak-valley difference are obtained, and the medium-long term prediction result of the primary quaternary peak-valley difference is corrected. The second correction process is shown in fig. 4. The secondary peak-valley difference correction stage comprises the following steps:
step 4.1, constructing a Bayesian ridge regression model based on the historical load peak-valley difference of the user at the demand side and the screened related correction factors, wherein the specific calculation process in the Bayesian ridge regression model is as follows:
Figure BDA0003308761190000112
Figure BDA0003308761190000113
wherein the content of the first and second substances,
p (w | a, b) is the parameter w distribution probability for conditional features a and b;
Figure BDA0003308761190000114
a loss function which is a ridge regression consisting in solving so that the ith output yiAnd input xijFamily of parameters between which the fitting error is minimal { beta }j};
Figure BDA0003308761190000115
Penalty loss regular as L2.
And 4.2, respectively establishing fitting relations between the difference of the quaternary peaks and the valley and population correction factors and resident consumption level correction factors based on a Bayesian ridge regression model, and obtaining population correction fitting curves and resident consumption level fitting curves as shown in FIGS. 6 and 7.
Step 4.3, sequentially calculating correction coefficients based on the two fitting curves, and correcting the peak-valley difference of the load of the demand side user in the measurement result of the peak-valley difference of the load of the demand side user obtained in the step 3 to obtain a correction result of the peak-valley difference of the load of the demand side user;
correction factor D*The calculation method of (c) is as follows:
Figure BDA0003308761190000121
in the formula (I), the compound is shown in the specification,
d is the measured peak-valley difference of a certain quarter of the nth year,
d is the corrected difference of peak and valley in a certain quarter of the nth year,
dn-1to fit the difference between the peak and valley of the season for a certain quarter of year n-1 on the curve,
dnthe difference between the peak and valley of the season for a certain quarter of the nth year on the fitted curve.
Taking population correction of the peak-valley difference of the spring of 2018 as an example, knowing population data of 2017 and 2018, finding corresponding peak-valley difference values d2017 and d2018 on a fitting curve, calculating a correction coefficient and correcting the peak-valley difference of the spring of 2018:
Figure BDA0003308761190000122
wherein D is the peak-to-valley difference in 2018 spring measured,
d is the corrected peak-to-valley difference in 2018 spring,
d2017the peak-to-valley difference value in the spring of 2017,
d2018the peak-to-valley difference value in spring of 2018.
Comparing the predicted value and the true value of the load at the moment to be predicted, and calculating error indexes MAPE, MAE and RMSE according to the following formulas:
Figure BDA0003308761190000123
Figure BDA0003308761190000124
Figure BDA0003308761190000131
wherein the content of the first and second substances,
leis the true value of the load at a certain moment,
Figure BDA0003308761190000132
respectively, are predicted values of the load at a certain moment,
ntestthe number of test samples;
in addition, the peak-valley difference medium-and-long-term measurement method based on the random forest and the secondary correction, as shown in table 5, is a prediction result of the quaternary peak-valley difference after the secondary correction.
The result shows that a certain deviation exists between the primary predicted value and the true value, but the predicted result after secondary correction has more obvious performance improvement compared with a first stage. The model after secondary correction has better prediction effect. By comparing the evaluation index results, the RMSE index, the MAPE index and the MAE index of the prediction result after secondary correction are all reduced compared with the primary prediction, and the prediction precision is effectively improved. This shows that the peak-valley difference twice prediction model improves the sensitivity of the model to the difference of the influence factors of different years to a certain extent, thereby further improving the accuracy of the final prediction result.
Fig. 5 is a primary medium-and-long term monthly peak-to-valley difference prediction result, and fig. 6 is a primary medium-and-long term seasonal peak-to-valley difference prediction result and a secondary corrected seasonal peak-to-valley difference prediction effect. The adopted secondary peak-valley difference prediction model not only considers the influence factors of the peak-valley difference from month to month and from quarter to quarter, but also considers the difference of the influence factors between years. The peak-valley difference is predicted through two stages of primary medium-long term prediction and secondary correction, so that the prediction precision is greatly improved. Finally, the superiority of the method provided by the research is proved by example analysis.
Table 5: two comparisons of predicted peak-to-valley difference
Figure BDA0003308761190000133
Figure BDA0003308761190000141
In conclusion, the prediction method can be used for predicting the load peak-valley difference on the demand side, and plays an important guiding role in power system scheduling, energy management and demand response implementation. Compared with other prediction methods, the method provided by the invention utilizes the annual influence factor difference to carry out secondary correction, so that the prediction precision is obviously improved, the load peak-valley difference of the resident user can be more accurately predicted, and the method has important significance for promoting the response development of the demand side and relieving the power supply and demand contradiction.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.

Claims (10)

1. The peak-valley difference medium-long term measurement and calculation method based on random forest and secondary correction is characterized by comprising the following steps of:
step 1, collecting historical electricity load data of a plurality of residential user areas within a set number of years, calculating historical peak-valley differences, collecting influence factor data influencing the load peak-valley differences, and taking related influence factors as alternative characteristics;
step 2, extracting the characteristics of the multi-source influence factors, extracting an optimal characteristic combination by adopting binary characteristic engineering, and taking the characteristic combination as the input of the random forest model in the step 3;
step 3, training the training data of the optimal characteristic combination selected in the step 2 by using a random forest algorithm to obtain a load peak valley difference measuring and calculating model of the user at the demand side, and outputting a primary medium-long term prediction result of the moon peak valley difference and the season peak valley difference;
and 4, selecting the screened related correction factors one by one as input based on the historically collected peak-valley difference data of the user load on the demand side, constructing a Bayesian regression model so as to realize fitting modeling of the peak-valley difference of the user load, and correcting the medium-long term prediction result of the primary quaternary peak-valley difference according to the fitting result.
2. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 1,
the historical load data of the electricity consumption in the step 1 comprise daily maximum load data and daily minimum load data; the historical peak-valley difference comprises a daily peak-valley difference, a monthly peak-valley difference and a seasonal peak-valley difference; the influencing factor characteristics include: daily maximum air temperature, daily minimum air temperature, daily average air temperature, air pressure, humidity, rainfall, wind speed, daily average load.
3. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 1,
the step 2 specifically comprises:
step 2.1, calculating the correlation degree between the candidate features and the peak-valley difference in the step 1, and screening n candidate features from high to low according to the correlation degree;
and 2.2, screening the optimal characteristic combination as the input of measurement and calculation by adopting a binary characteristic combination method.
4. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 2,
the step 2.2 specifically comprises:
step 2.2.1, use binary coding to distinguish the use state of the alternative features, i.e. used or abandoned, and screen out the binary feature data set
Figure FDA0003308761180000021
Step 2.2.2, binary characteristic data set screened out
Figure FDA0003308761180000022
And as an input of the genetic algorithm, searching for an optimal feature combination based on the genetic algorithm.
5. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 4,
the binary feature data set
Figure FDA0003308761180000023
Can be expressed as:
Figure FDA0003308761180000024
wherein n is the number of the alternative features,
x of the ith featureiCorresponding binary code is wi,wiHas two states of 0 and 1, when wiWhen 0, this feature is not used; when w isiWhen 1, this feature is used.
6. The method for calculating the peak-to-valley difference based on the random forest and the secondary correction as claimed in claim 1, wherein:
the step 3 specifically includes:
step 3.1, taking the random forest as a basis for one-time medium and long-term measurement and calculation;
step 3.2, aiming at the historical load data of the user at the demand side, calculating the load natural growth rate monthly and quarterly according to the time scale measured and calculated by the peak-valley difference based on a trend extrapolation method;
3.3, forming a data-driven training sample based on historical average peak-valley difference acquired month by month and season by season, the natural load increase rate and the screened peak-valley difference calculating influence factors; constructing a peak-valley difference measurement model based on random forests and training; the trained model can realize the output of the result of the medium-term and long-term measurement and calculation of the load peak-valley difference of the user at the demand side.
7. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 4,
step 3.1 specifically comprises:
step 3.1.1, setting a data set with optimal combination characteristics in the last N years as an original sample, sampling the original sample by using a bootstrap method, generating K data sets as a training set of a decision tree, wherein N is a positive integer and is less than the set number of years in the step 1;
step 3.1.2, if M input variables originally exist, each node randomly selects M specific variables, and determines the optimal classification point according to the M specific variables, wherein M is less than M;
step 3.1.3, each decision tree is grown to the maximum possible without pruning;
and 3.1.4, taking the average value of all decision trees as a predicted value.
8. The method for calculating the peak-to-valley difference based on random forest and quadratic correction as claimed in claim 1,
the step 4 specifically comprises the following steps:
step 4.1, constructing a Bayesian ridge regression model based on the historical load peak-valley difference of the user at the demand side and the screened related correction factors;
step 4.2, based on a Bayesian ridge regression model, respectively establishing fitting relations between the difference of the quaternary peak valley and population correction factors and between the difference of the quaternary peak valley and the population correction factors and between the difference of the residential consumption levels and obtaining population correction fitting curves and residential consumption level fitting curves;
and 4.3, calculating a correction coefficient based on the two fitting curves, and correcting the peak-valley difference of the load of the demand side user in the measurement result of the peak-valley difference of the load of the demand side user obtained in the step 3 in sequence to obtain a correction result of the peak-valley difference of the load of the demand side user.
9. The method for mid-to-long term measurement of peak-to-valley difference based on random forest and quadratic correction as claimed in claim 8,
the specific calculation process in the Bayesian ridge regression model is as follows:
Figure FDA0003308761180000031
Figure FDA0003308761180000032
wherein the content of the first and second substances,
p (w | a, b) is the parameter w distribution probability for conditional features a and b;
Figure FDA0003308761180000033
a loss function which is a ridge regression consisting in solving so that the ith output yiAnd input xijFamily of parameters between which the fitting error is minimal { beta }j};
Figure FDA0003308761180000034
Penalty loss regular as L2.
10. The method for mid-to-long term measurement of peak-to-valley difference based on random forest and quadratic correction as claimed in claim 9,
step 4.3 correction factor D*The calculation method of (c) is as follows:
Figure FDA0003308761180000041
in the formula (I), the compound is shown in the specification,
d is the measured peak-valley difference of a certain quarter of the nth year,
d is the corrected difference of peak and valley in a certain quarter of the nth year,
dn-1to fit the difference between the peak and valley of the season for a certain quarter of year n-1 on the curve,
dnthe difference between the peak and valley of the season for a certain quarter of the nth year on the fitted curve.
CN202111210827.1A 2021-10-18 2021-10-18 Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction Pending CN113869600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111210827.1A CN113869600A (en) 2021-10-18 2021-10-18 Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111210827.1A CN113869600A (en) 2021-10-18 2021-10-18 Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction

Publications (1)

Publication Number Publication Date
CN113869600A true CN113869600A (en) 2021-12-31

Family

ID=79000065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111210827.1A Pending CN113869600A (en) 2021-10-18 2021-10-18 Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction

Country Status (1)

Country Link
CN (1) CN113869600A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091293A (en) * 2014-07-30 2014-10-08 国家电网公司 ower grid long-term load characteristic predication method based on variation of electricity consumption structure
CN104134169A (en) * 2014-08-19 2014-11-05 国家电网公司 Calculation method based on peak and valley period electrical load weights
CN104200277A (en) * 2014-08-12 2014-12-10 南方电网科学研究院有限责任公司 Modeling method for medium and long term power load forecasting
CN105373795A (en) * 2015-09-18 2016-03-02 中国科学院计算技术研究所 A binary image feature extraction method and system
CN106980910A (en) * 2017-04-01 2017-07-25 国网宁夏电力公司经济技术研究院 Long Electric Power Load calculating system and method
CN108022001A (en) * 2017-09-20 2018-05-11 河海大学 Short term probability density Forecasting Methodology based on PCA and quantile estimate forest
CN109426901A (en) * 2017-08-25 2019-03-05 中国电力科学研究院 Long-term power consumption prediction method and device in one kind

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091293A (en) * 2014-07-30 2014-10-08 国家电网公司 ower grid long-term load characteristic predication method based on variation of electricity consumption structure
CN104200277A (en) * 2014-08-12 2014-12-10 南方电网科学研究院有限责任公司 Modeling method for medium and long term power load forecasting
CN104134169A (en) * 2014-08-19 2014-11-05 国家电网公司 Calculation method based on peak and valley period electrical load weights
CN105373795A (en) * 2015-09-18 2016-03-02 中国科学院计算技术研究所 A binary image feature extraction method and system
CN106980910A (en) * 2017-04-01 2017-07-25 国网宁夏电力公司经济技术研究院 Long Electric Power Load calculating system and method
CN109426901A (en) * 2017-08-25 2019-03-05 中国电力科学研究院 Long-term power consumption prediction method and device in one kind
CN108022001A (en) * 2017-09-20 2018-05-11 河海大学 Short term probability density Forecasting Methodology based on PCA and quantile estimate forest

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李伶杰;王银堂;胡庆芳;刘勇;刘定忠;崔婷婷;: "基于时变权重组合与贝叶斯修正的中长期径流预报", 地理科学进展, no. 04, 28 April 2020 (2020-04-28), pages 643 - 650 *

Similar Documents

Publication Publication Date Title
CN110263866B (en) Power consumer load interval prediction method based on deep learning
CN113962364B (en) Multi-factor power load prediction method based on deep learning
CN112633604B (en) Short-term power consumption prediction method based on I-LSTM
CN110969290B (en) Runoff probability prediction method and system based on deep learning
CN111080032A (en) Load prediction method based on Transformer structure
CN113537600B (en) Medium-long-term precipitation prediction modeling method for whole-process coupling machine learning
CN113705877B (en) Real-time moon runoff forecasting method based on deep learning model
CN110570030A (en) Wind power cluster power interval prediction method and system based on deep learning
CN111985719B (en) Power load prediction method based on improved long-term and short-term memory network
CN112100911B (en) Solar radiation prediction method based on depth BILSTM
CN116976529B (en) Cross-river-basin water diversion method and system based on supply-demand prediction dynamic correction
CN113554466A (en) Short-term power consumption prediction model construction method, prediction method and device
CN115495991A (en) Rainfall interval prediction method based on time convolution network
CN114169434A (en) Load prediction method
CN115204444A (en) Photovoltaic power prediction method based on improved cluster analysis and fusion integration algorithm
CN116842337A (en) Transformer fault diagnosis method based on LightGBM (gallium nitride based) optimal characteristics and COA-CNN (chip on board) model
Siddiqi et al. Genetic algorithm for the mutual information-based feature selection in univariate time series data
CN116703644A (en) Attention-RNN-based short-term power load prediction method
CN114117852B (en) Regional heat load rolling prediction method based on finite difference working domain division
CN113762591A (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM counterstudy
CN115481841A (en) Material demand prediction method based on feature extraction and improved random forest
CN116561569A (en) Industrial power load identification method based on EO feature selection and AdaBoost algorithm
CN111310974A (en) Short-term water demand prediction method based on GA-ELM
CN113869600A (en) Peak-valley difference medium-and-long-term prediction method based on random forest and secondary correction
CN114091782B (en) Medium-long term power load prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination