CN115130741A

CN115130741A - Multi-model fusion based multi-factor power demand medium and short term prediction method

Info

Publication number: CN115130741A
Application number: CN202210699834.0A
Authority: CN
Inventors: 赵旭; 姬庆庆; 张世俞; 谢昕彤; 李耀伟; 白吉康; 段俏; 黄春莉; 叶子静; 冯相融
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-06-20
Filing date: 2022-06-20
Publication date: 2022-09-30

Abstract

The invention discloses a multi-model fusion-based multi-factor power demand medium and short term prediction method, which predicts power demands under different time lengths by fusing multiple models and saves a large amount of prediction time on the premise of effectively improving prediction accuracy. The GBDT, XGboost and LightGBM models all belong to Boosting models, and are decision models based on decision trees, and answers of target problems are solved by finding optimal solutions in the decision trees. The Stacking framework is used for predicting the power data in two layers, after the prediction is completed by three Boosting models, namely GBDT, XGboost and LightGBM, the LR model is used for correcting the prediction result and outputting the corrected result, and the accuracy and the reliability of the prediction are enhanced. The three models are combined to jointly predict the power demand, so that the three models can form complementation, and a prediction result with higher precision is brought. The method belongs to a decision search type optimization solving algorithm, so that the time required by decision is far shorter than that of an artificial intelligence algorithm.

Description

Multi-model fusion based multi-factor power demand medium and short term prediction method

Technical Field

The invention relates to a multi-factor power demand medium and short term prediction method based on multi-model fusion, and belongs to the technical field of power demand evaluation.

Background

Electric power is one of the most important basic energy sources in the global scope, and can provide basic support and guarantee for industrial production and processing and daily life of residents. Because the electric energy does not have a high-quality storage carrier at the present stage, the storage efficiency is low only depending on a storage battery or a pumped storage power station, so that the generated energy is about equal to the required amount, otherwise, the consequences of resource waste and the like are brought. In addition, thermal power generation is used as the main electric energy generation method in most of the world, so that excessive power generation also causes serious environmental pollution. In addition, in recent power development, power supply shortage and power shortage have been frequently generated, and prediction of power demand is very important.

The power demand forecast refers to the prediction of the power demand of the power market in a future period of time, and is generally divided into three categories according to the time span: short term prediction, medium term prediction and long term prediction. The short-term prediction generally refers to prediction based on the minimum unit of days, and the commonly used methods include a linear recursive least square method, a state space method based on Kalman filtering, and the like. The medium-term prediction generally refers to prediction according to the month or the quarter as the minimum unit, and the common methods mainly include a seasonal index method, an ARIMA model method and the like. The long-term prediction generally refers to prediction based on the smallest unit of the year, and the common methods mainly include a moving average method, a neural network method and the like. From the characteristic of prediction, the historical data amount which can be used in short-term prediction is large, and meanwhile, different influences can be caused in different sections of false days, so that comprehensive consideration is needed in prediction; the data of the medium-term prediction has obvious seasonal characteristics, so that prediction work needs to be carried out by combining the characteristic; in long-term prediction, due to limited historical data and more external interference factors, data characteristics need to be fully mined during prediction, so that a better prediction result can be obtained. For short-term prediction, corresponding decision guidance can be provided for real-time scheduling of the power grid; and long-term prediction can provide data support for the expansion, capacity augmentation and the like of the power grid on the basis of guiding the planning and construction of the power system.

In the past, the power prediction technology mainly uses methods such as time series, multiple linear regression, ARIMA and the like of statistical theory, and these methods are theoretically simple and have a small amount of calculation, and thus have been widely used in the research of the initial stage of power prediction. However, since these methods are difficult to combine with external factors to perform prediction work, the prediction accuracy is greatly limited, and it is difficult to meet the actual requirements. Compared with the traditional statistical prediction method, the artificial intelligence technology can analyze and learn a large amount of data in a short time, remarkably improves the prediction precision and has obvious advantages.

At present, aiming at the problems of small information quantity and high prediction difficulty of medium and short-term power data, the method and the device fully consider the influence of meteorological factors on the power consumption, fuse the LightGBM, XGboost and GBDT, fully explore the correlation between the power demand and the meteorological data through the fused model, and train the model by utilizing the time sequence relation existing in the data, thereby obtaining a more accurate prediction effect.

Disclosure of Invention

The method is mainly used for power demand prediction, and the method can predict the power demands under different time length conditions by fusing multiple models, help the power company to reasonably distribute power resources and reasonably arrange power scheduling. The problems of low short-time prediction accuracy, long prediction time and the like exist in the current-stage power prediction, and the method disclosed by the invention can be used for saving a large amount of prediction time on the premise of effectively improving the prediction precision. The GBDT, XGboost and LightGBM models all belong to Boosting models, and are decision models based on decision trees, and answers of target problems are solved by finding optimal solutions in the decision trees. Compared with the artificial intelligence algorithm which is widely researched at the present stage, the method can obtain higher prediction precision more quickly. Meanwhile, in order to obtain a better prediction effect, the Stacking framework is used for predicting the power data in two layers, after the prediction is finished by three Boosting models, namely GBDT, XGboost and LightGBM, the LR model corrects the prediction result and outputs the corrected prediction result, and the prediction accuracy and the reliability are enhanced.

In order to achieve the purpose, the technical scheme adopted by the invention is a multi-factor power demand medium-short term prediction method based on multi-model fusion, which comprises the following steps,

s1, constructing a power data set;

the electric power data is derived from the electricity consumption data of 13 months in a certain city in China, which is published by a network, an original data set comprises 5 attribute items such as historical electricity consumption, temperature, humidity, wind speed, rainfall and the like, and all data are collected once every 15 minutes, namely the data of the 5 attribute items are recorded once every 15 minutes.

In order to verify the effectiveness of the model, a data set is divided into a training set and a testing set before the model is trained according to the power demand prediction tasks with different time lengths. Where the training set accounts for 70% of the original data and the test set accounts for 30% of the original data.

S1.1, data cleaning;

the data item of the power consumption in the data used by the invention is analyzed, and a data trend graph is drawn, as shown in fig. 1. It can be seen from the figure that the data has obvious fluctuation phenomenon from the Year 1Day 41. Since the original data is from a certain city in China, the time interval is supposed to be the period of the new year of the Chinese lunar calendar, a large number of city workers return to the countryside for the past year, and meanwhile, a large number of enterprises and public institutions stop working and production in the period, so that the power consumption fluctuates greatly. In order to reduce the influence of the abnormal fluctuation on the prediction result, data 15 days after the start of the Year 1Day 41 are classified as abnormal data and deleted from the data set.

S1.2 data normalization

The main objective of the operation of data standardization is to scale the original data in a fixed interval according to a certain rule, and eliminate the influence of 5 data dimensions with different attributes, such as electric quantity, temperature, humidity, wind speed, rainfall and the like in the original data, so that the model training result is not influenced by the original data dimensions. 5 attribute items such as electricity consumption, temperature, humidity, wind speed, rainfall and the like in the original data are normalized to a [0,1] interval according to the formula (1), so that subsequent processing is facilitated.

In the formula (I), the compound is shown in the specification,

is the normalized value, x, of the ith attribute value of the sample _i Is the i-th attribute value, X, of the sample _min Is the minimum attribute value, X, of the sample _max Is the maximum attribute value for that sample.

S2Boosting and decision tree

Ensemble learning (ensemble learning) is a learning task performed by constructing and combining multiple learners, and by combining multiple learners, it is often possible to obtain significantly superior generalization performance over a single learner. There are three common integrated learning ideas: (1) bagging, (2) boosting, and (3) stacking.

Boosting is an algorithm that can promote a weak learner to a strong learner, and the working mechanism of the algorithm is similar: training a base learner from an initial training set, adjusting the distribution of training samples according to the performance of the base learner, so that the training samples which are wrongly made by the previous base learner are concerned more in the subsequent process, and training the next base learner based on the adjusted distribution of the samples; repeating the above steps until the number of the base learners reaches the value N, and finally combining the N base learners in a weighted manner, wherein the algorithm flow is shown in fig. 2:

the decision tree is an important model in ensemble learning, and the core of the decision tree is a tree structure, as shown in fig. 3. The graph represents the mapping relation existing before the object attribute and the object value, the root node and the inner node represent the segmentation of the feature, and each branch represents the output of the region space of the feature corresponding to the parent node.

Decision trees are generally classified into classification trees, which are commonly used for class classification problems, and regression trees, which are commonly used for numerical prediction problems. Each leaf node of the regression tree can obtain a predicted value in the growing process, the threshold value of each characteristic value is exhausted during segmentation, the optimal segmentation variable and the optimal segmentation point are found by adopting a method of minimizing the square error, and then the most reliable segmentation basis is found by using the minimized square error, so that the predicted value of the current branch node is unique or a certain artificially established threshold value is ensured. And if the data of each leaf node is not unique, taking the average value of the node data as a predicted value.

The growth process of the regression tree generally comprises the following 5 steps:

step 1: inputting a training data set, as follows

T＝{(x ₁ ，y ₁ )，(x ₂ ，y ₂ )，…，(x _i ，y _i )，…，(x _n ，y _n )}，x _i ∈X∈R ⁿ ，y _i ∈Y∈R ⁿ (2)

x _i ＝(x _i ⁽¹⁾ ，x _i ⁽²⁾ ，…，x _i ⁿ ) To input an instance (feature vector),namely, meteorological factor data and historical power consumption in the power data. y is _i ＝(y _i ⁽¹⁾ ，y _i ⁽²⁾ ，…，y _i ⁿ ) Is an output instance (tag), i.e., the amount of electricity used at the present time in the power data.

X: the input variables comprise all input meteorological factor data and historical electricity consumption examples;

y: the output variable comprises all output electricity consumption examples at the current moment;

R ⁿ : a real number domain;

step 2: traversing all the characteristic variables of the power data and values thereof to find the jth characteristic variable x of the power data _j And its value s as a segmentation variable and a segmentation point, and defining two regions R ₁ (j，s)＝{x|x ^j S and R ₂ (j，s)＝{x|x ^j S such that it satisfies the square error minimization criterion, as follows:

c ₁ : make the region R ₁ The output value with the smallest square error in (j, s) is easily known as the region R ₁ Mean value of Y within (j, s)

c ₂ : make the region R ₂ The output value with the smallest square error in (j, s) is easily known as the region R ₂ Mean value of Y within (j, s)

The optimal segmentation variable x with the minimum total square error loss is obtained at the moment _j And a cut-off point s.

Step 3: after obtaining the first weight x _j After taking the segmentation scheme at s, the output of two sub-regions is calculated:

R ₁ (j，s)＝{x|x ^(j) ≤s} (4)

R ₂ (j，s)＝{x|x ^(j) ＞s} (5)

step 4: and continuously calling Step2 and Step3 for the two sub-regions to find the optimal variable characteristics of each branch node. The growing of the regression tree ends with all regions meeting a threshold or with all of the power data for which the growing is exhausted of various meteorological attribute data.

Step 5: dividing the meteorological attribute data and power demand data space in the input power data into M regions R ₁ ，R ₂ ，…，R _m ，…，R _M There is a fixed output value c for each divided cell region _m Generating a final power demand decision tree as follows:

the average value of Y in each divided area is obtained;

R _m is a divided area;

i is a function of the indication function,

the power demand can be preliminarily predicted by the above formula.

S3 gradient boosting decision algorithm

The algorithm process for predicting the power demand by taking GBDT as a regression algorithm is as follows:

suppose that in the power demand prediction problem, the training set sample is T ═ (x) ₁ ，y ₁ )，(x ₂ ，y ₂ )，...，(x _m ，y _m ) Maximum number of iterations N ^* (N ^* ∈R ⁺ )；

x _i ＝(x _i ⁽¹⁾ ，x _i ⁽²⁾ ，…，x _i ⁿ ) Are input power instances (eigenvectors), i.e., meteorological factor data and historical power usage in the power data.

y _i ＝(y _i ⁽¹⁾ ，y _i ⁽²⁾ ，…，y _i ⁿ ) Is an output instance (tag), i.e., the amount of electricity used at the present time in the power data.

m is the number of samples in the power data

Loss function 2 common mean square error function L (y, f (x)) ═ y-f (x))) ² The output is the strong learner f (x). The regression algorithm process is as follows:

step 1: the power demand data weak learner is initialized and the average value of C may be set to the average value of the samples y.

Step 2: for the number of

iterations t

1,2, 3.. N, for the

sample i

1,2, 3.. m, a negative gradient is calculated.

r _ti : the negative gradient value of the loss function on the ith sample in the t iteration;

L(y _i ，f(x _i )): loss function values of ith samples of the power demand data;

f(x _i ): the power demand data learner outputs at the ith sample, where the learner is the learner used in the previous iteration.

Calculating partial derivatives of the loss functions on the output values of the power demand data strong learner;

step 3: by using (x) _i ，r _ti ) Fitting a CART power demand prediction regression tree to obtain a tth regression tree, wherein the corresponding leaf node region of the tth regression tree is R _tj J is 1,2, 3. Where J is the leaf of the power demand prediction regression tree tThe number of child nodes.

Step 4: for leaf area J ═ 1,2, 3.., J, there is a best fit value at this time

arg min: minimizing the function value;

R _tj : leaf node regions of the t-th regression tree;

y _i : the true value of the ith power demand data sample;

f _t-1 (x _i ): the strong learner at sample x for the t-1 iteration _i The output value of (d);

c _tj : the minimum value of a loss function corresponding to the jth leaf node in the tth regression tree;

step 5: and updating the power demand forecast strong learner.

Finally, obtaining an expression of the power demand prediction strong learner f (x):

the GBDT can be suitable for most regression problems, and for dense data such as power demands, various distinguishing features and feature combinations can be found through the model, so that the model has strong generalization and expression capacity, and a better fitting effect is achieved.

In order to improve the training efficiency of the power demand prediction model and reduce the memory consumption of the system in the training process, a Light Gradient Boosting Machine (lightBGM) algorithm is provided on the basis of the traditional GBDT algorithm. The method can accurately find accurate splitting points for the power demand prediction problem, but has large memory occupation and calculation cost. Therefore, the method uses Histogram in the LightBGM algorithm to improve the processing speed of the electric power demand data training sample. The Histogram algorithm converts continuous characteristic values existing in the power demand data into discrete K bin characteristic values (bin values) by constructing a piecewise function in advance before training, and then establishes a Histogram containing K entries. The constructed histogram is used for traversing the training sample, in the process, the LightBGM algorithm accumulates statistics in the histogram according to K discrete values, and finally the optimal split point is found from the discrete values. The method can obviously reduce the calculation memory and calculation cost, and obviously improve the calculation speed.

In addition, the leaves of the GBDT algorithm adopt a Level-wise growth mode, and the mode does not distinguish the leaves in the same layer, but actually, the gain caused by the splitting of a plurality of leaves is low, and the waste of computing resources and memory resources is caused. To address this problem, LightBGM algorithms employ a more efficient Leaf-wise algorithm grown by Leaf. The algorithm is divided by finding the maximum division gain from a certain layer of leaves and is repeated continuously, so that the algorithm can obtain higher precision under the same division times. Meanwhile, the depth of the tree can be limited to avoid the over-fitting phenomenon when the sample size is small.

Based on the core idea of the GBDT algorithm, the LightBGM algorithm improves the characteristic splitting process and the tree growing mode by introducing a new method, so that the power demand prediction model is simpler, the required calculation cost is lower, and a more accurate prediction result can be obtained.

S4 XGboost algorithm

The XGboost algorithm is based on a decision tree Boosting optimization model, and the power demand prediction weak learner is combined into a power demand prediction strong learner through iteration. In the XGBoost algorithm, a CART regression tree is used as a weak learner, and first, an optimal structure of the tree, such as the number of leaf nodes and the depth of the tree, is determined. And then, adopting a distributed forward additive model, increasing the weight of the data which is subjected to the last error division when a single tree is generated each time, then using the data for the current tree, and gradually reducing the overall error of the model by continuously adding the tree until the training is finished.

The XGboost algorithm is trained by using a model for each tree as follows:

f _t (x)＝w _q(x) ，w∈R ^T ，q：R ^d {1，2，…，M} (12)

where w is the leaf node score value. x represents the input sample data, namely the meteorological factor data and the historical electricity consumption in the power data. q (x) represents the leaf node corresponding to the sample x, and M is the number of leaf nodes of the tree. The m tree formula is added to the model as follows:

training for a single CART tree first determines the objective function of the power demand prediction:

the objective function is divided into two parts, namely a loss function L and a regularization omega. For the regression problem, the degree of model fitting is generally evaluated using a loss, i.e., an L2 loss, which is a loss resulting from the square of the residual between the predicted value and the true value, and the loss, i.e., the square of the predicted power consumption and the true power consumption, is used as the power data. The regularization term acts as a penalty term for the model to prevent overfitting. The regularization term is defined as follows:

where M is the number of leaf nodes, w _j Is the L2 canonical of leaf node score, controlling the complexity of the tree using r and λ. Calculating the regularization term, substituting the formulas (12), (13) and (15) into an objective function, and expanding by using a second-order Taylor formula to obtain the mth tree leafThe node is of the form:

order to

Substituting into equation (16) and pairing the objective functions w _j Calculating the partial derivative, making the derivative function value be 0, solving:

the objective function is taken into account:

originally using Obj ^* And evaluating the quality of a single CART regression tree structure, wherein the XGboost enumerates splitting schemes of all the characteristics from the tree with the depth of 0 and calculates the objective function value of the splitting schemes so as to determine the optimal structure of the tree. And when the tree reaches the maximum depth, stopping building the decision tree when the sum of the sample weights is less than a set threshold value. The sample sampling proportion of each tree is controlled by the set parameters, and the optimal structure training process of one tree is finally realized through parameter adjustment.

After the XGboost finishes training one tree, Boosting is used for carrying out the next round of training, and the optimized training model structure is finally obtained through continuous iteration. After the XGboost is iterated once, the weight of the leaf node is multiplied by the learning rate, so that the influence of each tree is weakened, and a larger learning space is provided for the subsequent trees. And finally, determining the optimal iteration times of the model and finishing the training of the model.

S5LR model

The LR model is primarily represented by a conditional probability distribution P (Y | X) in the form of a parameterized logic distribution. Wherein, the numeric area of X as the random variable is real number, and X is the meteorological factor data and the historical power consumption in the electric power data. The distribution of the LR model conditions is as follows:

wherein x ∈ R ⁿ Is input, Y ∈ {0,1} is output, w ∈ R ⁿ And b ∈ R is a power demand prediction model training parameter, w is a weight vector, b is a bias, and w · x is an inner product of w and x.

For a given power demand data input instance x, P (Y ═ 1| x) and P (Y ═ 0| x) can be solved according to equations (19) and (20). Logistic regression compares the two conditional probability values and finds the class with the greater probability value, thereby assigning instance x to that class.

Expanding the weight vector w and the input power demand data vector x to obtain w ═ w ⁽¹⁾ ，w ⁽²⁾ ，…w ⁽ⁿ⁾ ，b) ^T ，x＝(x ⁽¹⁾ ，x ⁽²⁾ ，…x ⁽ⁿ⁾ ，1) ^T . At this time, the LR model is as follows:

the probability of an event being changed is determined by dividing the probability of an event not occurring by the probability of an event occurring, and assuming that the probability of an event occurring is p, the probability of an event not occurring is 1-p, and the probability of an event at this time is

The log probability of this event is given by the following equation, which may also be referred to as the logit function.

This function is used to determine the probability that the power demand belongs to the category.

For the logical regression, the expression (21) and the expression (22) can be used

As can be seen from the above equation, in the LR model, the logit function with the output Y ═ 1 has a linear relationship with the input x. The value range of the linear function w · x is a real number range, and the input x can be split by the linear function.

Since x is equal to R ⁿ⁺¹ ，w∈R ⁿ⁺¹ . The linear function w · x can be converted to a probability using equation (19):

when the linear function w · x approaches positive infinity infinitely, the value of the conditional probability approaches 1; when the linear function w · x infinitely approaches negative infinity, the value of the conditional probability approaches 0.

Given a training data set T ═ x ₁ ，y ₁ )，(x ₂ ，y ₂ )，...，(x _m ，y _m ) Wherein x is _i ∈R ⁿ ，y _i E {0,1}, where the LR model parameters are estimated using maximum likelihood estimation.

P(Y＝1|x)＝π(x) (24)

P(Y＝0|x)＝1-π(x) (25)

When the likelihood function is

The log-likelihood function is

The estimated value of w can be obtained by solving the maximum value of equation (26).

Next, an optimization objective function in the power demand prediction problem, wherein the power demand prediction objective function is a log-likelihood function. In logistic regression problems, gradient descent methods and quasi-newton methods are often used for solving. Suppose that

Is the maximum likelihood estimate of w, the resulting LR model is

Due to the limited learning ability of the LR model, the LR model is often combined with other models, and the other models acquire corresponding feature combinations through training, and then the LR model gives corresponding predicted values, that is, predicted power consumption.

Electric power demand prediction model based on Stacking fusion

In view of the fact that a single model cannot well meet the requirements of training performance and stability, the method tries to use the Stacking fusion model, thereby realizing the integration of the advantages of various Boosting models, combining the advantages with an LR regression model, realizing that the fusion model trained aiming at the power demand data has stronger discrimination and stability, and not needing too frequent iteration on the basis of obtaining good effect.

The overall idea of model training and testing is shown in fig. 4, the original power demand data is cleaned and normalized, then the power demand prediction model based on Stacking fusion is trained to obtain a corresponding power demand prediction model, and the test data is used for predicting the power demand in a period of time in the future.

The detailed process of model training is described next. Through the analysis in the foregoing, it can be found that the electricity demand data related to the invention has stronger regularity on the time sequence when being divided by day, month and season after the data of special holidays are removed, and meanwhile, the data size oriented by the invention is limited, so that the model based on the decision tree is more suitable for solving the problems. The GBDT, XGboost and LightGBM models have respective defects and defects in different scene prediction, and the effect of common gain can be achieved after the three models are fused. The Stacking is an integrated framework of a hierarchical model. The first layer is composed of a plurality of different base learners, three models of GBDT, XGboost and LightGBM are selected, and integrated prediction is carried out on the models when the models are adjusted to obtain good effects, so that the deviation of the models is reduced, and more ideal results are obtained; the LR regression model is selected for use in the second layer, so that the occurrence of overfitting can be further avoided, and the variance of the model is effectively reduced, so that the model is more stable.

The specific implementation steps of the power demand prediction model based on Stacking fusion are as follows:

step1, firstly, dividing an integral data set consisting of meteorological factor data and electricity demand into training data (training set) and prediction data (test set), and then dividing training samples into k groups of data with the same data quantity.

And (5) carrying out multiple times of training on the training data set by using each base learner, wherein k-1 data is used as a training sample in each training, and the rest data is used as a verification set. And predicting the power demand by using the meteorological factor data in the verification set so as to obtain k parts of prediction data obtained by passing through the verification set, and predicting the prediction sample in the training process every time so as to obtain k parts of prediction data. It should be noted that only the training set needs to perform the operation of this step, while the validation set and the test set do not.

Step3, combining k parts of prediction data obtained by passing through a verification set formed by the power demand data to obtain new training sample data, and averaging the k parts of prediction data to obtain new power demand prediction data, wherein the specific flow of the step is shown in fig. 4.

And Step4, inputting the power demand prediction data obtained at Step3 into the second-layer model to finally obtain a final prediction result, wherein the process can be shown in fig. 5.

The electric power demand model constructed by the invention uses three Boosting models of GBDT, XGboost and LightGBM on the first layer of the Stacking framework, and the LR model on the second layer of the Stacking framework directly outputs the prediction result. The overall framework of the model is shown in fig. 5.

Compared with the prior art, the prior art predicts the power demand data by using more single networks such as GBDT, XGboost and LightGBM, and the three models have respective defects to a certain extent. The model combines the three models to jointly predict the power demand, so that the three models can form complementation and a prediction result with higher precision is brought. In order to enable the model to obtain a better prediction effect, the Stacking framework is used for predicting the power data in two layers, after the prediction is finished by three Boosting models including GBDT, XGboost and LightGBM, the LR model corrects the prediction result and outputs the corrected result, and the accuracy and the reliability of the prediction are enhanced. The extraction capability of the single model to the data features is difficult to meet the expected effect of power demand prediction under different duration conditions, and the combined model obtains a better prediction effect through complementation between the models. In addition, compared with an artificial intelligence algorithm represented by a neural network, the method belongs to a decision search type optimization solving algorithm, and therefore the time required by decision is far shorter than that of the artificial intelligence algorithm.

Drawings

FIG. 1 is a graph of raw data power usage data.

Fig. 2 is an algorithm flow chart.

Fig. 3 decision tree.

FIG. 4 is an overall process of model training and prediction.

FIG. 5 is a fusion model framework proposed by the present invention.

FIG. 6 shows four season prediction results of power demand for different models; (a) spring; (b) summer; (c) autumn; (d) in winter.

FIG. 7 predicted results of power demand in weeks; (a) an XGB model; (b) an LGB model; (c) a GBDT model; (d) XLG-LR model.

FIG. 8 predicted results of power demand in months; (a) an XGB model; (b) an LGB model; (c) a GBDT model; (d) XLG-LR model.

FIG. 9 is a comparison of four season prediction results of different model power demands; (a) spring; (b) summer; (c) in autumn; (d) in winter.

FIG. 10 compares the predicted power demand in weeks for different models; (a) a TCN model; (b) an LSTM model; (c) a GRU model.

FIG. 11 is a comparison of power demand prediction results in months for different models; (a) a TCN model; (b) an LSTM model; (c) a GRU model.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples.

The power demand prediction is to calculate the power consumption demand for a period of time in the future based on the internal relationship between the historical power consumption data and the corresponding weather information. The estimated power consumption requirement has partial error with the actual power consumption requirement, the smaller the error is, the higher the accuracy of the model is, the closer the estimated power consumption requirement curve is to the actual power consumption requirement curve, the better the performance of the model is represented, and therefore the objective evaluation on the model has important significance for analyzing the quality of the model.

The present invention employs Mean Absolute Error (MAE), Root Mean Square Error (RMSE), mean percent error (MAPE), and goodness-of-fit coefficient (R) ² ) A total of 4 common model evaluation indexes evaluate a single deterministic model. The specific formula is as follows:

in the formula, Q _sim，n Indicates the predicted value, Q _obs，n The actual value is represented by the value of,

represents the mean of the predicted values and is,

representing the mean of the true values.

The smaller the index value in equations (29), (30), and (31), the smaller the error of the prediction model, and the closer the index value in equation (32) is to 1, the higher the accuracy of the prediction model [35], and the better the degree of fitting between the measured value and the predicted value.

The short-term power demand prediction results in different seasons are as follows:

when effect verification is carried out on seasonal power demands, data of 1day are randomly extracted in a test set to be tested, and accuracy of power demand prediction of the seasonal power demands at different time intervals in one day in a certain season is inspected. The prediction results of the XLG-LR model with the multi-model fusion to the power demands in different seasons are shown in FIG. 6, and it can be seen from the graph that the prediction results of the XLG-LR model are inferior to those of other models in some time periods, but the prediction results of the XLG-LR model in most time periods are closest to the true values in all models.

The results of the inventive process are shown in table 1 in comparison with other process results. As can be seen from the table, compared with the XGB model, the LGB model and the GBDT model, the XLG-LR model disclosed by the invention obtains the optimal result in the prediction of the power demand in different seasons under 4 evaluation indexes. For different seasons, the XLG-LR model can obtain the best prediction result in summer, and the prediction in winter is compared with that in other three seasonsAnd (4) poor. R is obtained when the summer power demand is predicted ² This illustrates the effect of the predicted power demand curve approaching a perfect fit with the real power demand curve, which is the result of the prediction at 0.9901.

TABLE 1 four seasons electric power demand prediction and evaluation index for each model

5.3 prediction of Power demand in weeks

In actual production, the power department often needs to plan production and scheduling work of the next week at the end of the week, so predicting power demand in weeks has a practical meaning of guiding the power department to schedule production scheduling. In order to evaluate the power demand forecast in week units, data of 1 consecutive week were randomly extracted in the test set for testing, and the accuracy of the power demand forecast at each different time period in 7 days of the week was examined. The prediction result of the power demand of the XLG-LR in week of the multi-model fusion model constructed by the invention is shown in FIG. 7, and it can be seen from the graph that the LGB, GBDT and XLG-LR models except the XGB model can better predict the power demand trend of one week, wherein the prediction of the XLG-LR model is closest to the true value.

The results of the XLG-LR model of the invention are shown in Table 2, along with other methods. As can be seen from the table, the 4 models are in R ² The indexes approach to 1, which shows that the prediction results of each model can better approach to the true values. On the MAE index, compared with the XGB model, the LGB model and the GBDT model, the XLG-LR model is improved by 35.42 percent, 2.97 percent and 4.03 percent respectively. The XLG-LR model provided by the invention can obtain a more accurate prediction result in power demand prediction in week units.

TABLE 2 prediction and evaluation index of power demand in weeks for each model

The prediction results of the power demand in months are as follows:

the power demand prediction in the unit of a month can help power enterprises to carry out monthly planning and reasonably arrange production scheduling and other works. In order to evaluate the power demand forecast in month, data of 1 month continuously are randomly extracted in a test set for testing, and the accuracy of the power demand forecast in each different period in 30 days in a month is examined. The prediction result of the power demand of the multi-model fusion model XLG-LR in month is shown in FIG. 8, and it can be seen from the graph that similar to the prediction of the power demand in month, except for the XGB model, the LGB, GBDT and XLG-LR models can better predict the power demand trend of one month, wherein the prediction of the XLG-LR model is closest to the true value.

The results of the XLG-LR model of the invention are shown in Table 3, along with other methods. As can be seen from the table, the 4 models are in R ² The indexes approach to 1, which shows that the prediction results of each model can better approach to the true values. However, on the three indexes of MAE, RMSE and MAPE, the XLG-LR model obtains the minimum value compared with the XGB model, the LGB model and the GBDT model, and the result shows that the XLG-LR model provided by the invention can obtain a more accurate prediction result when the power demand is predicted by taking months as a unit.

TABLE 3 predicted evaluation index of power demand in month unit for each model

Through the above experiments, although the power demand can be well predicted by using the GBDT, XGBoost and LightGBM models, the prediction results of different algorithms under different scenarios are not stable. The reason may be two aspects, one is that the data features are not the same in different scenarios, which may have an impact on the model training learning process. Secondly, the data volume of the data set used by the invention is limited, which affects the quality of the data to a certain extent. As a data-driven approach, the predictive performance of the GBDT, XGBoost, and LightGBM models is greatly affected by the amount and quality of the training data. Therefore, in order to effectively solve the problems, the invention provides the XLG-LR power demand prediction model based on Stacking fusion, and effectively solves various problems caused by singly using GBDT, XGGost and LightGBM models.

In recent years, with the development of neural networks, more and more students are trying to predict power demand using neural networks, wherein the models that are often used include Gated Recycling Units (GRUs), Long Short Term Memory networks (LSTM) [38], and time series convolution networks (TCNs). In order to verify the advancement and effectiveness of the XLG-LR model, the GRU, LSTM and TCN models and the XLG-LR model are trained by using the power demand data of the invention, and the training result is tested by using a test set.

LSTM is a long-term memory neural network widely used for learning and predicting the relevant relationships in sequence data. The Recurrent Neural Network (RNN) has the gradient disappearance problem, so that the long-time dependency of network learning is hindered, the LSTM reduces the problem by introducing a forgetting gate, an input gate and an output gate, so that a better effect can be obtained, Wang et al predict short-term photovoltaic electric quantity based on the method, and a comparison experiment is developed on the basis of the method. Temporal CNN (TCN) is a simple one-dimensional convolutional network that can be applied to time series data. Various layers in the network have temporal attributes for learning global and local features in the data. Convolutional layers also help improve model latency so that predictions can be processed in parallel. Wang et al predict the short-term power consumption of industrial users based on the method, and the invention develops a comparative experiment based on the method. The role of gating is more concerned in the GRU model, and especially, feature weights are introduced into the formula of the GRU model to enhance the extraction capability of data features. Based on the method, Gao et al carries out short-term power load prediction, and predicts the power load for the next 48 hours by taking each hour as a unit.

During comparison, relevant parameters in the GRU, LSTM and TCN models need to be set, and in the stage of developing a comparison experiment, the parameter setting of each model is shown in table 4.

Table 4 model parameter settings

When the electric power demand prediction model based on Stacking fusion designed by the invention is used for training and testing, the input form of data is data usage multiplied by data characteristic number. Unlike this model, the GRU, LSTM and TCN models, when trained and tested, input data in the form of data usage x data feature number x time window length. The size of the time window needs to be adjusted according to the prediction requirements of different time lengths.

Firstly, the four methods are used for comparing the prediction effects of seasonal power demands, the same training set and test set as the 5.2 sections are adopted to carry out experiments, and the accuracy of power demand prediction of the seasonal power demands at different time intervals in one day in a certain season is investigated. The prediction results of the four models for the power demands in different seasons are shown in fig. 9, and it can be seen from the graph that the XLG-LR model of the present invention is closest to the true value in most of the time.

The results of the XLG-LR model of the invention are shown in Table 5 together with the results of the other three neural network methods. As can be seen from the table, compared with three models, namely GRU, LSTM and TCN, the XLG-LR model disclosed by the invention has remarkable advantages in prediction of power demands in different seasons under 4 evaluation indexes.

TABLE 5 four seasons electric power demand prediction results of different models

And secondly, comparing the prediction effects of the power demand in the week unit by using four methods, carrying out experiments by using the same training set and test set as the 5.3 sections, and inspecting the prediction accuracy of the power demand in different periods in 7 days of the week. The prediction results of the four models on the one-week power demand trend are shown in fig. 10, and it can be seen from the graph that the three models of the GRU, the LSTM and the TCN have obvious prediction deviation in the periods of high and low power demand, while the XLG-LR model of the present invention can accurately predict the power demand change trend in most periods.

The results of the XLG-LR model of the invention are shown in Table 6 together with the results of the other three neural network methods. As can be seen from the table, compared with three models of GRU, LSTM and TCN, the XLG-LR model of the invention has obvious advantages in power demand prediction in week units under 4 evaluation indexes, and each index is superior to other models.

TABLE 6 prediction of power demand in weeks for different models

And then, comparing the prediction effects of the power demand with the monthly unit by using four methods, carrying out experiments by using the training set and the test set which are the same as the 5.3 sections, and inspecting the prediction accuracy of the power demand in different periods in 30 days in the month. The prediction results of the four models on the power demand trend of 1 month are shown in fig. 11, and it can be seen from the graph that the three models of GRU, LSTM and TCN have obvious prediction deviation in the period of low power demand, while the XLG-LR model of the present invention can basically coincide with the real demand in most periods.

The results of the XLG-LR model of the invention are shown in Table 7 together with the results of the other three neural network methods. As can be seen from the table, compared with three models, namely GRU, LSTM and TCN, the XLG-LR model disclosed by the invention has remarkable advantages in power demand prediction in monthly units under 4 evaluation indexes, the curve fitting effect is best, and the power demand prediction error is minimum.

TABLE 7 prediction of power demand in months for different models

The prediction time of the model is related to the convenience of the model in actual use, the same training data is adopted to compare the time consumption of the XLG-LR model and other three neural network methods in the model prediction stage, and the specific structure is shown in Table 8.

TABLE 8 time required for power demand prediction phase under different scenarios for different models

As can be seen from the table, the XLG-LR model can consume the shortest time to complete the prediction under each prediction scene. Compared with other three neural network methods, the difference of the required time is at least one order of magnitude, and the XLG-LR model has absolute advantage in prediction time.

Through the comparison experiment, the XLG-LR model has obvious advantages in the aspects of prediction accuracy and prediction time consumption compared with the classical neural network algorithm. The XLG-LR model is mainly constructed depending on a decision tree principle, and a global optimal solution is finally obtained by continuously optimizing a local optimal solution in the solving process. The neural network needs to compare the data features extracted from the test data with the trained model, so as to give an optimal solution. The data features in the training model are numerous and have certain similarity, so the XLG-LR model is not similar to the XLG-LR model in terms of accuracy and time consumption. Therefore, the XLG-LR model can obtain a more ideal prediction result for the power demand prediction problem of different scenes.

Although the method provided by the invention obtains a more ideal power demand prediction result, some problems to be solved in the future still exist.

(1) The data set is relatively small and contains a limited amount of information. The data set currently used by the present invention is only 13 months of data, which may reduce the generalization and reliability of the model to some extent. The GBDT, XGboost and LightGBM algorithms used by the invention can obtain better prediction results on small data sets, but if the data volume is more abundant, better prediction results can be obtained. Thus, in future research and exploration, the current data set can be supplemented by collecting more months of data, thereby creating a larger, more informative data set for power demand forecasting.

(2) More indicators other than meteorological factors may also be able to influence the prediction. The magnitude of the power demand can be affected by a number of factors, including local economic development levels, industrial structure, and the like. Although the indexes adopted in the research can have necessary influence on the power consumption demand to a certain extent, other indexes can also have influence on the power consumption demand. Therefore, in the future, experts in other related fields of discipline can try to know other factors influencing the index of the power demand, collect more index data capable of influencing the power demand, and supplement the index data into the current data set.

Example verification

And (3) constructing an XLG-LR power demand prediction model based on Stacking fusion by using electric power and meteorological data of 13 months and using three models of GBDT, XGBboost and LightGBM. After the data are divided into a training set and a test set, the 4 models are trained, and the feasibility of the models is verified by adopting the test set. The experiment of the invention is developed under the following software and hardware conditions, namely a software environment; python3.7, tensorflow2.8.0, sklern, seaborn, numpy, matplotlib, pandas development kit, hardware environment: the display card model is AMD Radon (TM) Vega 8Graphics, and the memory is 8 GB.

The verification link starts from different time lengths such as seasonal prediction, weekly prediction and monthly prediction, and the GBDT, LightGBM and XLG-LR models can obtain relatively satisfactory prediction results except that XGboost models are slightly inferior in performance under different time lengths, wherein the XLG-LR model provided by the invention has the best effect. From the perspective of model prediction accuracy, the overall prediction accuracy is ranked as XLG-LR > GBDT > LightGBM > XGboost. In addition, the power demand prediction result of the XLG-LR model is compared with the power demand prediction results of three mainstream neural network models including the TCN, the GRU and the LSTM, and the result shows that the XLG-LR model can obtain the optimal experimental result in the data set compared with the neural network model. Through the discussion, the reliability and effectiveness of the XLG-LR model of the invention on power demand prediction are verified. When the power demand data change or the meteorological data change, the model is trained to form a new prediction model only by adopting a new data set, and the data change can be dealt with, so that corresponding prediction work is carried out. The method can be suitable for power demand prediction in other areas, and a new prediction model needs to be trained by adopting a new data set. In addition, the method is packaged into corresponding software, has good interactive operation, and can be put into wider practical application in the future.

In the future, a larger power demand database can be constructed by collecting more power demand data, so that the accuracy and the advancement of the algorithm in power demand prediction are verified. Meanwhile, on the premise that the data volume is sufficient enough, long-term electricity demand prediction, such as prediction of electricity demand in the next year, can be tried to be carried out by using the method. Furthermore, the electricity utilization requirement is also closely related to other factors besides meteorological factors, such as economic development level, regional industry layout and the like, and the prediction accuracy of the method can be improved by supplementing the data in the future. In addition, the method can also be applied to other fields, such as water resource demand, prediction of coal resource demand and the like.

Claims

1. The multi-factor power demand medium-short term prediction method based on multi-model fusion is characterized by comprising the following steps: the method comprises the following steps of,

s1 constructing a power data set;

the electric power data are derived from power utilization data disclosed by a network, an original data set comprises 5 attribute items of historical power utilization quantity, temperature, humidity, wind speed and rainfall, and all data are collected once every 15 minutes, namely the data of the 5 attribute items are recorded once every 15 minutes; according to the power demand prediction tasks with different durations, dividing a data set into a training set and a testing set before model training; wherein the training set accounts for 70% of the original data, and the testing set accounts for 30% of the original data;

s1.1, data cleaning;

analyzing a data item of the power consumption in the used data, and drawing a data trend graph; classifying the data 15 days after the start of the Year 1Day 41 as abnormal data, and deleting the abnormal data from the data set;

s1.2 data normalization

The data standardization aims at scaling the original data in a fixed interval according to a certain rule, eliminating the influence caused by 5 data dimensions with different attributes of electric quantity, temperature, humidity, wind speed and rainfall in the original data and ensuring that the model training result is not influenced by the original data dimensions; normalizing 5 attribute items of electricity consumption, temperature, humidity, wind speed and rainfall in the original data to a [0,1] interval according to the formula (1), so as to facilitate subsequent processing;

in the formula (I), the compound is shown in the specification,

is the normalized value, x, of the ith attribute value of the sample _i Is the i-th attribute value, X, of the sample _min Is the minimum attribute value, X, of the sample _max Is the maximum attribute value of the sample;

s2Boosting and decision tree

Training a base learner from an initial training set, adjusting the distribution of training samples according to the performance of the base learner, so that the training samples which are wrongly made by the previous base learner are concerned more in the subsequent process, and training the next base learner based on the adjusted distribution of the samples; this is repeated until the number of basis learners reaches the value N specified for implementation, which is finally weighted by the N basis learners:

the decision tree is divided into a classification tree and a regression tree, the threshold value of each characteristic value is exhausted during segmentation, the optimal segmentation variable and the optimal segmentation point are found by adopting a method of minimizing the square error, the most reliable segmentation basis is found by using the minimized square error, and the predicted value of the current branch node is unique or is a certain artificially established threshold value; if the data of each leaf node is not unique, taking the average value of the node data as a predicted value;

s3 gradient boosting decision algorithm

The GBDT as a regression algorithm predicts the power demand algorithm as follows:

suppose that in the power demand prediction problem, the training set sample is T ═ (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) Maximum number of iterations is N ^* (N ^* ∈R ⁺ )；

x _i ＝(x _i ⁽¹⁾ ,x _i ⁽²⁾ ,…,x _i ⁿ ) Inputting an electric power example, namely a characteristic vector, namely meteorological factor data and historical electricity consumption in electric power data;

y _i ＝(y _i ⁽¹⁾ ,y _i ⁽²⁾ ,…,y _i ⁿ ) The method comprises the steps of outputting an example, namely a label, namely the electricity consumption at the current moment in the electricity data; m is the number of samples in the power data

Loss function 2 common mean square error function L (y, f (x)) ═ y-f (x))) ² The output is a strong learner f (x); the regression algorithm proceeds as follows:

step 1: initializing a power demand data weak learning device, wherein the mean value of C can be set as the mean value of a sample y;

step 2: for the number of iterations t 1,2,3 … N, for samples i 1,2,3 … m, a negative gradient is calculated;

r _ti : a negative gradient value of the loss function on the ith sample in the t iteration;

L(y _i ,f(x _i )): loss function values of ith samples of the power demand data;

f(x _i ): the output value of the power demand data strong learner at the ith sample, wherein the strong learner is a strong learner used in the previous iteration;

step 3: by using (x) _i ,r _ti ) Fitting a CART power demand prediction regression tree to obtain a tth regression tree, wherein i is 1,2,3, …, m, and the corresponding leaf node region of the tth regression tree is R _tj J is 1,2,3, …, J; j is the number of leaf nodes of the power demand prediction regression tree t;

step 4: for leaf area J equal to 1,2,3, …, J, there is a best fit value

argmin is the minimization of the function value;

R _tj leaf node regions of the tth regression tree;

y _i the ith power supplyThe actual value of the force demand data sample;

f _t-1 (x _i ) T-1 th iteration strong learning device at sample x _i The output value of (d);

c _tj the minimum value of a loss function corresponding to the jth leaf node in the tth regression tree;

step 5: updating the power demand prediction strong learner;

the processing speed of the power demand data training sample is increased by using Histogram in the LightBGM algorithm; the Histogram algorithm is used for converting continuous characteristic values existing in power demand data into discrete K box characteristic values by constructing a piecewise function in advance before training, and then establishing a Histogram containing K items; traversing the training sample by using the constructed histogram, wherein in the process, the LightBGM algorithm accumulates statistics in the histogram according to K discrete values, and finally finding the optimal split point from the discrete values;

the method comprises the following steps that the leaves of the GBDT algorithm are split and repeated continuously by searching the leaves of a certain layer with the maximum splitting gain in a Level-wise growth mode;

the LightBGM algorithm is based on the core thought of the GBDT algorithm, and a new method is introduced to improve the characteristic splitting process and the tree growth mode;

s4 XGboost algorithm

The XGboost algorithm is based on a decision tree Boosting optimization model, and the weak power demand prediction learners are combined into a strong power demand prediction learner through iteration; in the XGboost algorithm, a CART regression tree is used as a weak learner, the optimal structure of the tree is firstly determined, then a distributed forward additive model is adopted, the weight of data which is mistakenly divided last time is increased and then used for a current tree when a single tree is generated each time, and the overall error of the model is gradually reduced by continuously adding the tree until the training is finished;

the XGboost algorithm is trained with the model for each tree as follows:

f _t (x)＝w _q(x) ,w∈R ^T ,q:R ^d {1,2,…,M} (12)

wherein w is the leaf node score value; x represents input sample data, namely meteorological factor data and historical electricity consumption in the electric power data; q (x) represents the leaf node corresponding to the sample x, and M is the number of the leaf nodes of the tree; the formula for adding the mth tree is as follows:

the target function is divided into a loss function L and a regularization omega; in the electric power data, the loss of the square of the predicted electric power consumption and the real electric power consumption is used; the regularization term plays a model penalty term to prevent overfitting; the regularization term is defined as follows:

where M is the number of leaf nodes, w _j Is the L2 norm of the leaf node score, using the complexity of the r and λ control trees; calculating the regularization term, substituting the formulas (12), (13) and (15) into an objective function, and expanding by using a second-order Taylor formula to obtain the form of the leaf nodes of the mth tree as follows:

order to

Substituting into equation (16) and pairing the objective functions w _j Calculating a partial derivative, and solving the following result if the derivative function value is 0:

brought into the objective function is:

using Obj ^* Evaluating the quality of a single CART regression tree structure, enumerating splitting schemes of all features from a tree with the depth of 0 by XGboost, and calculating a target function value of the splitting schemes so as to determine the optimal structure of the tree; when the tree reaches the maximum depth, stopping establishing the decision tree when the sum of the sample weights is smaller than a set threshold; the sample sampling proportion of each tree is controlled by the set parameters, and the optimal structure training process of one tree is finally realized through parameter adjustment;

after the XGboost finishes one tree, Boosting is used for carrying out next round of training, and an optimized training model structure is finally obtained through continuous iteration; after the XGboost is iterated once, the weight of the leaf node is multiplied by the learning rate, so that the influence of each tree is weakened, and a larger learning space is provided for subsequent trees; finally, determining the optimal iteration times of the model, and finishing the training of the model;

s5LR model

The LR model is mainly represented by a conditional probability distribution P (Y | X) in the form of a parameterized logical distribution; wherein, the numeric area of X as the random variable is real number, and X is the meteorological factor data and the historical power consumption in the electric power data. The distribution of conditions for the LR model is as follows:

wherein x ∈ R ⁿ Is input, Y ∈ {0,1} is output, w ∈ R ⁿ B belongs to R and is a power demand prediction model training parameter, w is a weight vector, b is a bias quantity, and w.x is an inner product of w and x;

for a given power demand data input instance x, P (Y ═ 1| x) and P (Y ═ 0| x) can be solved according to equations (19) and (20); the logistic regression compares the two conditional probability values, and finds a class with a larger probability value, so that the example x is distributed to the class;

expanding the weight vector w and the input power demand data vector x to obtain w ═ w ⁽¹⁾ ,w ⁽²⁾ ,…w ⁽ⁿ⁾ ,b) ^T ，x＝(x ⁽¹⁾ ,x ⁽²⁾ ,…x ⁽ⁿ⁾ ,1) ^T (ii) a In this case, the LR model is as follows:

Log probability of the event is given byAlso called logic function;

the function is used to determine the probability of the category to which the power demand belongs;

As can be seen from the above equation, in the LR model, the logit function with the output Y ═ 1 has a linear relationship with the input x; the value domain of the linear function w.x is a real number domain, and the input x can be split by the linear function;

since x is equal to R ⁿ⁺¹ ，w∈R ⁿ⁺¹ (ii) a The linear function w · x can be converted to a probability using equation (19):

when the linear function w · x approaches positive infinity infinitely, the value of the conditional probability approaches 1; when the linear function w · x infinitely approaches negative infinity, the value of the conditional probability approaches 0;

given a training data set T ═ x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m ) Wherein x is _i ∈R ^m ，y _i E {0,1}, wherein the LR model parameters are estimated by using a maximum likelihood estimation method;

P(Y＝1|x)＝π(x) (24)

P(Y＝0|x)＝1-π(x) (25)

when the likelihood function is

The log-likelihood function is

The estimated value of w is obtained by solving the maximum value of the formula (26);

an optimization objective function in the power demand prediction problem, wherein the power demand prediction objective function is a log-likelihood function; solving the logistic regression problem by using a gradient descent method and a quasi-Newton method; suppose that

Is the maximum likelihood estimate of w, the resulting LR model is

Because the learning capacity of the LR model is limited, the LR model is often combined with other models, the other models acquire corresponding feature combinations through training, and then the LR model gives corresponding predicted values, namely predicted power consumption;

electric power demand prediction model based on Stacking fusion

The advantages of various Boosting models are integrated by using the Stacking fusion model, and the Stacking fusion model is combined with an LR regression model to realize that the fusion model trained aiming at the power demand data has stronger discrimination and stability;

the model training and testing process comprises the following steps of firstly, cleaning and normalizing original power demand data, then training a power demand prediction model based on Stacking fusion to obtain a corresponding power demand prediction model, and predicting the power demand in a future period of time by using test data;

the three models are fused to achieve the effect of common gain; stacking is an integrated framework of a hierarchical model; the first layer is composed of a plurality of different base learners, three models including GBDT, XGboost and LightGBM are selected, and integrated prediction is carried out on the models when the models are adjusted to obtain good effects; the LR regression model is selected for use in the second layer, so that the occurrence of overfitting can be further avoided, and the variance of the model is effectively reduced, so that the model is more stable.

2. The multi-model fusion-based multi-factor power demand medium-short term prediction method according to claim 1, characterized in that:

step 1: inputting a training data set, as follows

T＝{(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _i ,y _i ),…,(x _n ,y _n )},x _i ∈X∈R ⁿ ,y _i ∈Y∈R ⁿ (2)

x _i ＝(x _i ⁽¹⁾ ,x _i ⁽²⁾ ,…,x _i ⁿ ) Inputting examples, namely feature vectors, namely meteorological factor data and historical electricity consumption in the electric power data;

y _i ＝(y _i ⁽¹⁾ ,y _i ⁽²⁾ ,…,y _i ⁿ ) The method comprises the steps of outputting an example, namely a label, namely the electricity consumption at the current moment in the electricity data;

R ⁿ : a real number domain;

step 2: traversing all the characteristic variables of the power data and values thereof to find the jth characteristic variable x of the power data _j And its value s as a segmentation variable and a segmentation point, and defining two regions R ₁ (j,s)＝{x|x ^j S and R ₂ (j,s)＝{x|x ^j >s, such that it satisfies a square error minimization criterion,the following formula:

c ₁ : make the region R ₁ The output value with the smallest square error in (j, s) is the region R ₁ Mean value of Y within (j, s)

c ₂ : make the region R ₂ The output value with the smallest square error in (j, s) is the region R ₂ Mean value of Y within (j, s)

The optimal segmentation variable x with the minimum total square error loss is obtained at the moment _j And a point of tangency s;

step 3: after obtaining the first weight x _j After taking the segmentation scheme at s, the output of the two sub-regions is calculated:

R ₁ (j,s)＝{x|x ^(j) ≤s} (4)R ₂ (j,s)＝{x|x ^(j) >s} (5)

step 4: continuously calling Step2 and Step3 for the two sub-regions, and searching the optimal variable characteristics of each branch node; the growth of the regression tree is finished when all the regions meet the threshold value or all the meteorological attribute data in the power data for the growth of the regression tree are exhausted;

step 5: dividing the meteorological attribute data and power demand data space in the input power data into M regions R ₁ ,R ₂ ,…,R _m ,…，R _M There is a fixed output value c for each divided cell region _m Generating a final power demand decision tree as follows:

the average value of Y in each divided area is obtained;

R _m is a divided area;

i is a function of the indication,

and realizing preliminary prediction on the power demand.

3. The multi-model fusion-based multi-factor power demand medium-short term prediction method according to claim 1, characterized in that:

step1, dividing an integral data set consisting of meteorological factor data and electricity demand into training data, namely training sets, and prediction data, namely testing sets, and then dividing training samples into k groups of data with the same data quantity;

step2, carrying out multiple times of training on the training data set by using each base learner, wherein k-1 parts of data are used as training samples in each training, and the rest part of data is used as a verification set; forecasting the power demand by using meteorological factor data in the verification set so as to obtain k parts of forecasting data obtained by the verification set, and forecasting a forecasting sample in each training process so as to obtain k parts of forecasting data; note that only the training set needs to perform the operation of this step, while the validation set and the test set do not;

step3, combining k predicted data obtained through a verification set formed by the power demand data to obtain new training sample data, and averaging the k predicted data to obtain new power demand predicted data;

step4, inputting the power demand prediction data obtained in Step3 into a second-layer model to finally obtain a final prediction result;

the constructed power demand model uses three Boosting models including GBDT, XGboost and LightGBM at the first layer of the Stacking framework, and the LR model at the second layer of the Stacking framework directly outputs the prediction result.