CN117933470A

CN117933470A - Regression tree-based electricity consumption prediction method and system

Info

Publication number: CN117933470A
Application number: CN202410099074.9A
Authority: CN
Inventors: 夏永年; 陈新生
Original assignee: Individual
Current assignee: Individual
Priority date: 2024-01-24
Filing date: 2024-01-24
Publication date: 2024-04-26

Abstract

The invention provides a regression tree-based electricity consumption prediction method, which comprises the following steps: step 1: acquiring historical electricity consumption data; step 2: preprocessing the historical electricity consumption data to obtain preprocessed historical electricity consumption data; step 3: constructing a power consumption prediction model, inputting the preprocessed historical power consumption data to train the power consumption prediction model, and obtaining a trained power consumption prediction model; step 4: and obtaining a power consumption prediction result based on the trained power consumption prediction model. According to the invention, the training efficiency of the power consumption prediction model is improved, the PCA algorithm and GBDT algorithm are combined in the construction of the power consumption prediction model, the accuracy of power consumption prediction is improved, and the integrity of information is reserved.

Description

Regression tree-based electricity consumption prediction method and system

Technical Field

The invention belongs to the technical field of electricity consumption prediction, and particularly relates to an electricity consumption prediction method and system based on a regression tree.

Background

Smart grids are a major direction of future power system revolution, and as construction scales are gradually expanded, smart grids form large data of electric power including production data, marketing data, and related socioeconomic data. The establishment of a more intelligent, economical, environment-friendly and low-carbon development-promoting power grid is also a global common goal. The development of the power industry also needs to follow the construction development of the intelligent power grid, and the accurate judgment and prediction of the future power consumption change trend have important significance for accurately, scientifically and reasonably planning the power enterprise and improving the running stability and economy of the power system.

In the prior art, the electricity consumption prediction method has weak real-time performance, namely, the advanced period is a predetermined static period instead of dynamic update, and is directly used when electricity consumption is predicted, so that after a period of time, obvious errors can occur in the advanced period, thereby causing errors to the electricity consumption. The direct prediction of the power consumption leads to larger error of the prediction result, eliminates seasonal influence, and the change of the power consumption in each time period is not obvious, so that the error can not meet the precision requirement.

Disclosure of Invention

The invention aims to provide a regression tree-based electricity consumption prediction method and a regression tree-based electricity consumption prediction system, and aims to solve the technical problems of low electricity consumption prediction precision and low instantaneity in the prior art.

In order to achieve the above purpose, the invention adopts the following technical scheme: the utility model provides a power consumption prediction method based on regression trees, which comprises the following steps:

Step 1: acquiring historical electricity consumption data;

Step 2: preprocessing the historical electricity consumption data to obtain preprocessed historical electricity consumption data;

Step 3: constructing a power consumption prediction model, inputting the preprocessed historical power consumption data to train the power consumption prediction model, and obtaining a trained power consumption prediction model;

Step 4: and obtaining a power consumption prediction result based on the trained power consumption prediction model.

Preferably, step 2 comprises:

and performing missing value filling, abnormal value correction and data standardization processing on the historical electricity consumption data to obtain preprocessed historical electricity consumption data.

Preferably, the filling the missing value of the historical electricity consumption data includes:

acquiring data of a time period with a missing value and a time period adjacent to the corresponding missing value time period based on the historical electricity consumption data;

Filling the missing value based on the average value of the data of the adjacent time periods of the missing value time period by using a linear interpolation formula to obtain historical power consumption data filled with the missing value;

the linear interpolation formula is:

Wherein I is the time period of the missing value, x _i is the power consumption data after the missing value is filled, x _i-1 is the power consumption of the I-1 time period, x _i+1 is the power consumption data of the i+1 time period, and I is the number of the missing values.

Preferably, the performing the outlier correction on the historical electricity consumption data includes:

acquiring the average value of the historical power consumption data of each time period based on the historical power consumption data filled with the missing values;

acquiring a sample standard deviation based on the average value of the historical electricity consumption data of each time period;

if the historical electricity consumption data in the preset time period is larger than the standard deviation of the sample, the historical electricity consumption data is an abnormal value, and if the historical electricity consumption data is smaller than the standard deviation of the sample, the historical electricity consumption data is a normal value;

Correcting the abnormal value based on the abnormal value correction formula to obtain historical electricity consumption data after the abnormal value correction;

The calculation formula of the standard deviation of the sample is as follows:

wherein sigma ^m is the sample standard deviation of the historical electricity consumption data, The average value of the historical power consumption data of each time period is x _n, the historical power consumption data of the nth time period is x, and m is the data number of the historical power consumption data;

The outlier correction formula is:

wherein, For the historical power consumption data after abnormal value correction,/>For the mean value of the historical electricity consumption data of time period m, σ ^m is the sample standard deviation,/>Is an outlier.

Preferably, the data normalization of the historical power usage data includes:

Calculating the average difference and standard deviation of the historical power consumption data after the abnormal value correction;

Data standardization is carried out on the historical electricity consumption data after the abnormal value correction based on the average difference and the standard deviation of the historical electricity consumption data after the abnormal value correction;

The data standardization calculation formula is as follows:

wherein X' is data normalized historical power consumption data, X is abnormal value corrected historical power consumption data, mu is an average value of abnormal value corrected historical power consumption data, and sigma is a standard deviation of abnormal value corrected historical power consumption data.

Preferably, step3 comprises:

Step 3.1: performing dimension reduction on the preprocessed historical power consumption data based on a principal component analysis method;

step 3.2: the historical power consumption data after dimension reduction is used as training data to be input into a power consumption prediction model;

step 3.3: initializing a power consumption prediction model, and obtaining an initial regression tree in the power consumption prediction model;

Step 3.4: performing iterative operation on the initial regression tree to obtain a regression tree with the iterative operation completed;

Step 3.5: acquiring a power consumption prediction model based on the regression tree completed by the iterative operation;

The initial regression tree calculation formula is:

Where f ₀ (x) is the initial regression tree, c is the minimum constant value of the regression tree loss function, y _i is the predicted value of the ith weak regression tree, and N is a constant.

Preferably, step 3.1 comprises:

step 3.1.1: acquiring N-dimensional characteristics of the preprocessed historical power consumption data and characteristic values corresponding to the characteristics;

Step 3.1.2: centering the N-dimensional characteristic value of the preprocessed historical power consumption data to obtain N-dimensional characteristics of the centered historical power consumption data;

Step 3.1.3: acquiring a covariance matrix of the historical power consumption data based on the N-dimensional characteristics of the centralized historical power consumption data;

step 3.1.4: acquiring eigenvalues and corresponding eigenvectors of a covariance matrix based on the covariance matrix of the historical electricity consumption data;

step 3.1.5: projecting N-dimensional characteristics of the historical power consumption data onto the characteristic vector to obtain the reduced-dimension historical power consumption data;

the centralised calculation formula is:

Wherein M is the number of samples of the historical electricity consumption data, For the characteristic value after the i-th characteristic centralization,/>Is the feature value of the ith feature;

The covariance matrix is calculated by the formula:

Where C is the covariance matrix, cov (x ₁,x₁) is the variance of feature x ₁, cov (x ₂,x₂) is the variance of feature x ₂, cov (x ₂,x₁) is the covariance of feature x2, cov (x ₁,x₂) is the covariance of feature x ₁, For the first feature in the N-dimensional features of the historical electricity consumption data,/>The first characteristic value after centralization, M is the sample number of the historical electricity consumption data;

Preferably, step 3.4 comprises:

Step 3.4.1: carrying out iterative operation on the initial regression tree by using an iterative operation method to obtain a residual fitting value of the regression tree after the iterative operation;

step 3.4.2: obtaining the minimum value of the regression tree node area after iterative operation based on a linear search method;

Step 3.4.3: acquiring a power consumption prediction model completed by iterative operation based on a residual fitting value of the regression tree after the iterative operation and a minimum value of a node area of the regression tree after the iterative operation;

The calculation formula of the regression tree residual fitting value is as follows:

Wherein r _mi is the residual fitting value of the regression tree, As gradient parameters, y _i is a predicted value of the ith weak regression tree, L (y _i,f(x_i)) is a loss function, and f (x _i) is a result obtained by fitting the ith iterative regression tree;

the calculation formula of the minimum value of the regression tree node area is as follows:

Wherein c _mj is the minimum value of the node area of the regression tree, R _mj is the node area of the regression tree, c is the minimum constant value of the regression tree loss function, f _m-1(x_i) is the result obtained by fitting the m-1 th iterative regression tree, and y _i is the predicted value of the i-th weak regression tree;

the electricity consumption prediction model completed by the iterative operation is as follows:

wherein, For the electricity consumption prediction model completed by the iterative operation, W is the maximum iterative times, J is a constant, c _mj is the minimum value of the regression tree node area, and I is an indication function.

Preferably, step 4 comprises:

Calculating average absolute error, mean square error, root mean square error and average absolute percentage error of the electricity consumption prediction result and the real result, and outputting the electricity consumption prediction result if the values of the average absolute error, the mean square error, the root mean square error and the average absolute percentage error are in a preset range;

Wherein, MAE is average absolute error, MSE is mean square error, RMSE is root mean square error, MAPE is average absolute percentage error, x _i is power consumption prediction result, and y _i is power consumption real result.

A big data based electricity usage prediction system, comprising:

the data acquisition module is used for acquiring historical electricity consumption data;

the data preprocessing module is used for preprocessing the historical power consumption data;

the electricity consumption prediction model module is used for constructing an electricity consumption prediction model and training the electricity consumption prediction model by using the preprocessed historical electricity consumption data;

and the prediction result output module is used for outputting a power consumption prediction result.

The regression tree-based electricity consumption prediction method and system provided by the invention have the beneficial effects that: compared with the prior art, the power consumption prediction method has strong instantaneity, the training data is preprocessed, repeated characteristic values are removed and the data is subjected to dimension reduction in the process of constructing the power consumption prediction model, the training efficiency of the power consumption prediction model is improved, the PCA algorithm and the GBDT algorithm are combined in the process of constructing the power consumption prediction model, the accuracy of power consumption prediction is improved, and the integrity of information is reserved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a structural framework diagram of a regression tree-based electricity consumption prediction method according to an embodiment of the present invention;

fig. 2 is a flowchart of a power consumption prediction system based on big data according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, a method for predicting power consumption based on regression trees according to the present invention will now be described.

The power consumption prediction method based on the regression tree comprises the following steps:

Step 1: acquiring historical electricity consumption data;

Due to incomplete data storage or the influence of human factors, part of data is usually lost in the original data set, the lost data not only can improve the complexity of electricity consumption prediction, but also can influence the accuracy of electricity consumption prediction to a certain extent, so that after the original data set is taken, the lost data in the original data set needs to be correspondingly processed, and the quality of the original data set is further improved.

Further, step 2 includes:

Further, performing missing value filling on the historical electricity consumption data includes:

Acquiring data of a time period in which a missing value occurs and a time period adjacent to the corresponding missing value time period based on the historical electricity consumption data;

the linear interpolation formula is:

Further, performing outlier correction on the historical electricity consumption data includes:

The calculation formula of the standard deviation of the sample is as follows:

The outlier correction formula is:

In the machine learning process, the orders of magnitude of different characteristics of different types of data which are inconsistent in most cases are different in units and can have larger values, and the influence caused by that the variance of a certain characteristic is higher than that of other characteristics by several orders of magnitude is likely to occur, so that the target result is easily influenced by the influence. The response causes the algorithm model to ignore other features in the learning process.

Further, data normalization of the historical power usage data includes:

The data standardization calculation formula is as follows:

Linear function normalization is the scaling of raw data into the range of [0,1] by linear transformation, the transformation of the raw data into the range of 0 mean and 1 standard deviation by transforming the raw data, for normalization the maxima and minima may be affected if the outliers in the dataset are not processed, the normalization result is obviously no longer accurate but for normalization: if outliers still exist in the data set, the average value is not greatly affected by a few outliers due to the large scale of the data set, so that the variance is small relative to the change of an accurate result, and the normalization is stable and suitable for a modern noisy large data scene under the condition that the data set is large enough.

In general, various factors are considered in constructing a power consumption prediction model, and certain characteristic information is obtained through the various factors for model learning so as to improve the accuracy of a prediction result. However, if too many unnecessary features are considered in the process of constructing the model, the model may be trapped into overfitting, so that the effectiveness of the model is weakened, and too many features also cause more complicated learning and prediction of the modeling process to be reduced, so that deleting part of the unimportant features is a very important step before constructing the model, and feature selection can be performed by calculating a correlation coefficient matrix.

The linear correlation between the feature and the electric quantity and the correlation between different feature variables can be judged according to the Pearson correlation coefficient, so that whether the feature has redundancy and the value interval is [0,1] can be judged, if the correlation coefficient between the two feature variables is equal to-1, the two feature variables are completely negatively correlated, and if the redundancy relationship possibly exists, one feature variable can be selected to be deleted, and if the correlation coefficient between the two feature variables is equal to +1, the two variables are completely positively correlated; if the correlation coefficient between the power and a certain characteristic variable is equal to zero or close to 0, it is also possible to try to delete the variable, indicating that the power is linearly uncorrelated with the characteristic variable. The pearson correlation coefficient calculation formula for the two variables X and Y is:

Wherein r is pearson correlation coefficient, X is a first characteristic variable, Y is a second characteristic variable, Is the average value of the first characteristic variable,/>Is the average value of the second characteristic variable.

The correlation pearson coefficients between the electricity consumption and the influence factors of the electricity consumption are calculated to obtain a correlation coefficient matrix, the linear correlation degree of the electricity consumption and three characteristics of humidity, rainy day, bad weather is low, the correlation coefficient value is only the correlation coefficient value, and the linear correlation coefficient between the two characteristics of holidays and working days is-1, so that the correlation coefficient is expressed and completely negatively correlated, and the two characteristics have contradictory relation, so that one of the two characteristics can be deleted.

further, step 3 includes:

further, step 3.1 includes:

The main analysis algorithm is a dimension reduction method, converts multiple indexes into a few comprehensive indexes, wherein each main component can reflect most of information of an original variable, the contained information is not repeated, and the main analysis algorithm can lead in multiple variables and simultaneously reduce complex factors into a plurality of main components, so that the problem is simplified, and meanwhile, the obtained result is more scientific and effective data information.

The preprocessed historical power consumption data has m samples { X ¹,X²,…,X^m }, wherein X ¹ is a first data sample and X2 is a second data sample. X ^m is the mth data sample, each sample having N-dimensional featuresWherein X ⁱ is the N-dimensional characteristic of the historical electricity consumption sample, and m is the number of the historical electricity consumption data.

step 3.1.3: acquiring a covariance matrix of the historical power consumption data based on the N-dimensional characteristics of the characteristic-centered historical power consumption data;

the centralised calculation formula is:

The covariance matrix is calculated by the formula:

In the covariance matrix, a covariance of greater than 0 indicates that x ₁ and x ₂ increase if one increases and the other increases, and a covariance of less than 0 indicates that one increases and the other decreases, and the covariance is independent when the covariance is 0, the larger the absolute value of the covariance is, the larger the influence of the covariance and the absolute value of the covariance is, the smaller the absolute value of the covariance and the absolute value of the covariance are, and the influence of the covariance and the covariance is smaller.

The covariance matrix eigenvalues are: a ^T ax=λx, a spatial coordinate system is established, and the spatial coordinate of the predicted value of each iteration is obtained, wherein Ax is:

Wherein x ₁ is the abscissa of the predicted value of the regression tree of the first iteration, x ₂ is the abscissa of the predicted value of the regression tree of the second iteration, x _m is the abscissa of the predicted value of the regression tree of the mth iteration, y ₁ is the ordinate of the predicted value of the regression tree of the first iteration, z ₁ is the ordinate of the predicted value of the regression tree of the first iteration, y ₂ is the ordinate of the predicted value of the regression tree of the second iteration, z ₂ is the ordinate of the predicted value of the regression tree of the second iteration, y _m is the ordinate of the predicted value of the regression tree of the mth iteration, z _m is the ordinate of the predicted value of the regression tree of the mth iteration, and (x, y, z) is the center point of all predicted values.

Formula a ^T ax=λx is deduced, and x _T is multiplied on both sides to obtain x ^TA^TA_x＝x^T λx (a _x)^T(A_x)＝λx^T x, the specific diagnosis vector can be obtained as a unit vector, and λ= (a _x)^T(A_x) is obtained), wherein the value of each element in Ax is the projection value of each point vector on the feature vector.

further, step 3.4 includes:

The power consumption prediction model is a lifting method based on a decision tree based function on a lifting tree model, and is an addition model of a plurality of decision tree models.

The initial regression tree calculation formula is:

Further, step 4 includes:

The power consumption prediction method has strong real-time performance, the training data is preprocessed, repeated eigenvalue removal and data dimension reduction are carried out in the process of constructing the power consumption prediction model, the training efficiency of the power consumption prediction model is improved, the PCA algorithm and the GBDT algorithm are combined in the construction of the power consumption prediction model, the accuracy of power consumption prediction is improved, and the integrity of information is reserved.

The application provides an introduction of the method for predicting the electricity consumption, and provides an electricity consumption prediction system based on big data from the aspect of a functional module in order to facilitate better implementation of the method for predicting the electricity consumption.

According to the fig. 2, a power consumption prediction system based on big data includes:

Compared with the prior art, the power consumption prediction system based on big data has the same beneficial effects as the power consumption prediction method based on the regression tree described in the technical scheme, and the detailed description is omitted.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The power consumption prediction method based on the regression tree is characterized by comprising the following steps of:

Step 1: acquiring historical electricity consumption data;

2. The regression tree-based power consumption prediction method according to claim 1, wherein the step 2 comprises:

3. The regression tree-based power consumption prediction method according to claim 2, wherein the missing value filling of the historical power consumption data comprises:

the linear interpolation formula is:

4. The regression tree-based power consumption prediction method according to claim 2, wherein the performing the outlier correction on the historical power consumption data includes:

The calculation formula of the standard deviation of the sample is as follows:

The outlier correction formula is:

5. The regression tree-based power usage prediction method of claim 2, wherein the data normalization of the historical power usage data comprises:

The data standardization calculation formula is as follows:

6. The regression tree-based power consumption prediction method according to claim 1, wherein the step 3 comprises:

step 3.1: performing dimension reduction on the preprocessed historical power consumption data based on a main analysis method;

The initial regression tree calculation formula is:

7. The regression tree-based power consumption prediction method according to claim 6, wherein the step 3.1 comprises:

the centralised calculation formula is:

The covariance matrix is calculated by the formula:

where C is the covariance matrix, cov (x ₁,x₁) is the variance of feature x ₁, cov (x ₂,x₂) is the variance of feature x ₂, cov (x ₂,x₁) is the covariance of feature x ₂, cov (x ₁,x₂) is the covariance of feature x ₁, For the first feature in the N-dimensional features of the historical electricity consumption data,/>And the first characteristic value after centralization is M, which is the number of samples of the historical power consumption data.

8. The regression tree-based power consumption prediction method according to claim 6, wherein the step 3.4 comprises:

9. The regression tree-based power consumption prediction method according to claim 1, wherein the step 4 comprises:

10. A big data based electricity consumption prediction system, comprising:

the data acquisition module is used for acquiring the historical electricity consumption data;

The data preprocessing module is used for preprocessing the historical electricity consumption data;

the electricity consumption prediction model module is used for constructing the electricity consumption prediction model and training the electricity consumption prediction model by using the preprocessed historical electricity consumption data;

And the prediction result output module is used for outputting the power consumption prediction result.