CN111985719A

CN111985719A - Power load prediction method based on improved long-term and short-term memory network

Info

Publication number: CN111985719A
Application number: CN202010878240.7A
Authority: CN
Inventors: 覃晖; 裴少乾; 卢桂源; 吕昊; 谢伟; 曲昱华; 付佳龙
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-11-24
Anticipated expiration: 2040-08-27
Also published as: CN111985719B

Abstract

The invention discloses a power load prediction method based on an improved long-short term memory network, which adopts the characteristic that a maximum information coefficient initially screens a historical load, and further selects the historical load by adopting a maximum correlation minimum redundancy algorithm in combination with the consideration of the influence brought by a load correlation factor, takes the screened characteristic set as the input of a model, adopts the improved long-short term memory network to predict the power load, verifies the obtained prediction result with the actual power grid load, and proves the practicability of the model. The forecasting method (H-ILSTM) accurately considers the power load and relevant factors influencing the load, effectively improves the precision of power load forecasting, and improves the safety and the economy of power grid operation to a certain extent.

Description

Power load prediction method based on improved long-term and short-term memory network

Technical Field

The invention relates to the technical field of power load prediction, in particular to a power load prediction method based on an improved long-term and short-term memory network.

Background

The power load prediction plays a key role in the operation of the power grid, and the safety and the economy of the operation of the power grid can be greatly improved by obtaining accurate short-term load. In addition, the method has important significance for optimal combination, economic dispatching, optimal power flow and electric power market transaction of the unit. The higher the accuracy of the power load prediction, the better the utilization rate of the power generation equipment and the better the economic dispatching effect. However, the electrical load is sensitive to external factors, such as climate change, date type and social activity, etc. These uncertainties increase the randomness of the payload sequence. Therefore, how to identify the strong relevant factors of the extracted load from the many influencing factors and realize accurate prediction of the short-term load is a problem to be solved at present.

In the deep learning method, a long-short term memory network (LSTM) is composed of a recurrent neural network of complex units, and has good capability of representing sequence information. The method has good results in solving the time series prediction problem. However, the input of LSTM is typically human-defined and is physically less meaningful and less interpretable for the model. The method has the advantages that the relation between the relevant factors influencing the power load is analyzed by adopting the characteristic engineering, so that the important significance is realized on improving the prediction precision of the power load, and the physical significance and the explanatory property of the model are enhanced. Therefore, how to accurately analyze the relationship between the power load and the related influence factors and combine the relationship into a prediction model to improve the accuracy of power load prediction is a theoretical and practical engineering problem which needs to be solved urgently.

Feature engineering is the selection of a representative subset of features in a feature set. These features are highly correlated with output variables and are the most common method of extracting valid features. Common feature selection methods include auto-correlation (AC), Mutual Information (MI), relieff (rf), feature selection (CFS) based on correlation, and the like.

In the present invention, the maximum information factor is used to preliminarily screen the characteristics of the historical power load. And performing secondary screening on the characteristics by combining a maximum correlation minimum redundancy method with a prediction model.

In summary, the problems of the prior art are as follows:

(1) there are many factors that affect the power load and these factors cannot be reasonably considered in the load prediction problem.

(2) Most of the existing feature choices related to power load prediction are to consider the influence of a certain feature on the power load, and the effect of the combination of the features is not considered.

(3) Because there are many factors in the input of the prediction model, the traditional machine learning method cannot predict the power load with high precision. The traditional LSTM can achieve better prediction effect, but the structure is complicated, the training time is long, and inconvenience is brought to practical application.

The difficulty of solving the technical problems is as follows: how to properly handle these continuous and discrete features is one of the technical difficulties in order to consider the influence of the correlation factors on the power load. The influence between the factors is complicated, a single factor may not greatly improve the prediction, but the prediction precision can be greatly improved after the factors are combined, and how to screen the combined factors and input the combined factors into a model for training is the second difficulty of the technology; in addition, how to change the structure of the LSTM is difficult to improve the prediction accuracy and reduce the required training time in solving the load prediction problem.

After the technical problem is solved, the significance is brought as follows: in order to consider the mutual influence between factors, the invention provides a two-stage hybrid feature extraction method and a novel network (ILSTM) applied to power load prediction. The network structure is different from the traditional LSTM, and the correct derivation of the forward propagation and backward propagation formulas of the network is the difficulty of realizing the network.

Most of the existing power load prediction performed by adopting a deep learning method is to predict the power load in the next time period. The method can accurately predict the power load of a plurality of future time intervals, and has practical application significance. Therefore, the method is very beneficial to popularization of H-ILSTM.

Disclosure of Invention

The present invention is directed to solving the above-mentioned problems, and an object of the present invention is to provide a power load prediction method based on an improved long and short term memory network, which can accurately analyze the influence between related features, screen out an optimal feature set, and perform prediction by using a novel network, so as to obtain a high-precision power load prediction result.

The invention realizes the purpose through the following technical scheme:

the invention comprises the following steps:

(1) collecting data of the power load and relevant factors thereof, and carrying out primary screening according to the influence of the historical power load on the future;

the method comprises the steps of collecting historical power load sequences, analyzing the relation between loads in 168 time intervals (one time interval is 1 hour) in seven days in the history to predicted loads required in the same day by adopting a Maximum Information Coefficient (MIC), forming a preselected set by using historical loads with the calculated MIC larger than 0.6, and forming the rest loads into a candidate set.

(2) Performing primary processing on the power load related factor;

factors relevant to the power load are collected, wherein the factors comprise a continuous characteristic and a discrete characteristic, and the discrete characteristic needs to be processed in a certain mode to be applied by the prediction model. The invention adopts a LabelEncoder coding mode to code the date type and the time type. Wherein the discrete features include a date category and a time category. The date types are divided into workdays, holidays and legal holidays, and the legal holidays can be classified into 7 types because the legal holidays in China have seven days at most. In the time category, three periods are divided into one domain according to the power load characteristics.

(3) Performing secondary screening on the historical power load by combining the power load related factor and the prediction model;

the relationship between the loads in the candidate subset and the load currently to be predicted is analyzed. And analyzing the characteristics of the candidate subset by adopting a maximum correlation minimum redundancy method. The following formula:

in the formula, S_m-1Represents a preselected set, X-S_m-1Representing a candidate set, c is a category variable, m is the number of features, x_jIs the jth feature. I (x)_jAnd c) represents the mutual information number between the jth characteristic and the category variable.

This method finds features that maximize the value of the above formula in the remaining feature space based on the selected features. And placing the features with the maximum calculated coefficient lambda into a preselected feature subset, normalizing all the features by combining other related factors influencing the power load, and then placing the normalized features into a prediction model for prediction. The root mean square error RMSE is used as the discrimination factor and if the RMSE is reduced, the process continues until the RMSE becomes higher than before the addition of the feature.

(4) Putting the screened final feature set into an improved long and short memory network ILSTM for prediction, and firstly setting parameters of the NLSTM, including the number of input layer nodes, the number of hidden layer nodes, the number of output layer nodes, learning efficiency, batch size and training round number;

(5) the individual weight parameters of the ILSTM are trained on a training set using an adaptive moment estimation (Adam) optimizer in conjunction with a mini-batch mechanism. (ii) a

(6) And inputting the test set into the trained ILSTM for prediction to obtain a power load prediction result. Further, in step (5), the step and the calculation formula of the information forward propagation of the improved long and short memory network ILSTM in the t-th time period are as follows:

u_t＝σ(net_ut) (3)

o_t＝σ(net_ot) (8)

h_t＝o_t*tanh(c_t) (9)

z_t＝W_y·h_t+b_y (10)

y_t＝σ(zt) (11)

in the above formula, net_ut,

net_otAnd z_tIs the state of the current stage of the t period; w_u,

And W_oTheir weight matrices, respectively; b_u,

And b_oRespectively represent their deviation vectors; u. of_t,

C_t,o_t,h_tAnd y_tRespectively an updating gate, an information state, a cell state, an output gate, a hidden layer output and a predicted value in a time period t; tanh and σ are the tan h and sigmoid activation functions, respectively; the symbols sum represent matrix multiplication and multiplication between matrix elements, respectively.

Further, in the step (5), the step of information back propagation in the t-th time period and the calculation formula are as follows:

a. defining the most common square error function as the target to be optimized

b. Calculating errors of output layers

c. Calculating errors of hidden layers

W＝W-η·W (28)

d. Using Adam optimization algorithm with [ w ]_h,w_x,b]And [ w_y,b_y]To update [ w_h,w_x,b]And [ w_y,b_y](ii) a To better illustrate the update process, the weight is denoted by the symbol W, the gradient of the weight is denoted by W, and the general formula for Adam updating the weight is:

m_t＝β₁·m_t-1+(1-β₁)·W_t (29)

v_t＝β₂·v_t-1+(1-β₂)·(W_t)² (30)

wherein E_tAs an error function, y_tAnd Y_tRespectively, predicted values and observed values. m is_tAnd

is the deviation of the first moment estimated and corrected, v_tAnd

is the deviation of the biased second moment estimate and the corrected second moment. Beta is a₁,β₂And Adam's parameters, default to 0.9,0.999 and 10, respectively^-8(ii) a Eta represents learning efficiency;

according to the formula, the predicted value is calculated by forward propagation, then the weight is updated by backward propagation, each round of training set is trained by taking a batch with a certain size, and each batch is updated once.

(7) And inputting the test set into the trained ILSTM for prediction to obtain a power load prediction result.

The invention has the beneficial effects that:

the invention relates to a power load prediction method based on an improved long-short term memory network, which is characterized in that related factors influencing the power load are analyzed through feature engineering, a two-stage feature extraction method is adopted to screen features, and the screening method comprises a filtering type and a wrapping type, so that a feature set of an adaptive model can be better obtained. And the ILSTM is adopted for power load prediction, compared with the traditional LSTM, on one hand, the time required by model training is saved, and on the other hand, the prediction precision of the model is improved. The prediction model provided by the invention can accurately predict the future power load, has higher practical application significance and is beneficial to popularization.

Drawings

FIG. 1 is a flow chart of a method for predicting power load of an improved long-short term memory network according to an embodiment of the present invention;

FIG. 2 is a diagram of an ILSTM network architecture provided by an embodiment of the present invention;

FIG. 3 is a diagram illustrating a causal relationship of wind speed in a Xinjiang Fuzi station case and an equivalent tree thereof according to an embodiment of the present invention;

FIG. 4 is a comparative graph of wind speed prediction results of the Xinjiang Fuchun station case provided by the embodiment of the invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings in which:

fig. 1 is a general flowchart of the method for predicting the power load of the improved long-short term memory network of the present invention, which specifically includes the following steps:

(2) Performing primary processing on the power load related factor;

factors relevant to the power load are collected, wherein the factors comprise a continuous characteristic and a discrete characteristic, and the discrete characteristic needs to be processed in a certain mode to be applied by the prediction model. The invention adopts a LabelEncoder coding mode to code the date type and the time type. Wherein the discrete features include a date category and a time category. The date types are divided into working days, rest days and legal holidays, and the specific coding mode is shown in table 1 because the legal holidays in China have seven days at most:

TABLE 1 date type code table

Working day	Saturday wine	Sunday day	Holiday of one day	Two-day holiday
					1	2	3	4	5
Three day holiday	Four day holiday	Holiday of five days	Six day holiday	Seven day holiday
					6	7	8	9	10

In the time category, three time intervals are divided into a domain according to the characteristics of the power load, and the specific coding mode is as shown in table 2:

TABLE 2 time type coding scheme

00:00-03:00	03:00-06:00	06:00-09:00	09:00-12:00
				1	2	3	4
12:00-15:00	15:00-18:00	18:00-21:00	21:00-24:00
				5	6	7	8

(4) Putting the screened final feature set into an improved long and short memory network ILSTM for prediction, and firstly setting parameters of the NLSTM, including the number of input layer nodes, the number of hidden layer nodes, the number of output layer nodes, learning efficiency, batch size, training round number and initializing weight;

(5) the individual weight parameters of the ILSTM are trained on a training set using an adaptive moment estimation (Adam) optimizer in conjunction with a mini-batch mechanism.

The step and the calculation formula of the information forward propagation in the t-th time period are as follows:

u_t＝σ(net_ut) (3)

o_t＝σ(net_ot) (8)

h_t＝o_t*tanh(c_t) (9)

z_t＝W_y·h_t+b_y (10)

y_t＝σ(zt) (11)

in the above formula, net_ut,

net_otAnd z_tIs the state of the current stage of the t period; w_u,

And W_oTheir weight matrices, respectively; b_u,

And b_oRespectively represent their deviation vectors; u. of_t,

The step and the calculation formula of information back propagation in the t-th time period are as follows:

a. defining the most common square error function as the target to be optimized

b. Calculating errors of output layers

c. Calculating errors of hidden layers

W＝W-η·W (28)

m_t＝β₁·m_t-1+(1-β₁)·W_t (29)

v_t＝β₂·v_t-1+(1-β₂)·(W_t)² (30)

is the deviation of the first moment estimated and corrected, v_tAnd

(6) And inputting the test set into the trained ILSTM for prediction to obtain a power load prediction result.

Fig. 2 is a diagram showing the structure of the ILSTM network.

Fig. 3 is a radar distribution diagram showing the maximum information coefficient of the power load for the first 7 days (168 periods).

The use of the present invention is further described below in conjunction with specific experiments.

The method carries out the prediction of the power load in two periods of the future by taking the power load of Wuhan in Huazhong as an object, and adopts meteorological data of two months from 22 days 3 and 22 days 2015 to 23 days 5 and 2015. The data time step is 1 hour, 1488 time intervals are totally, the first 1191 time intervals are divided into training sets, and the last 297 time intervals are divided into testing sets. Relevant factors affecting the electrical load include temperature, humidity, dew point, date category and time category. The Maximum Information Coefficient (MIC) for the load of the first 7 days of calculation is shown in fig. 3. And divides the load into a preselected set and a candidate set according to the MIC. And further screening the load by adopting a maximum correlation minimum redundancy method, and combining the model to obtain a final characteristic set. And synthesizing the feature set and the forecast factor set into a training set to train the model, and finally inputting the test set into the trained ILSTM to predict to obtain a power load prediction result.

To verify the predictive performance of the ILSTM, the following eight models were constructed to predict average wind speed and compared:

H-ILSTM: the method adopts ILSTM, and the feature selection adopts the feature extraction method of the invention;

ILSTM: the method adopts ILSTM, and the feature selection only adopts a single maximum information coefficient method for analysis;

③ H-LSTM: the method adopts the traditional LSTM, and the feature extraction method is adopted for feature selection;

fourthly, LSTM: the method adopts the traditional LSTM, and the characteristic selection is only analyzed by a single maximum information coefficient method;

H-GRU: the method adopts GRU, and the feature extraction method is adopted for feature selection;

sixthly, GRU: the method adopts GRU, and the characteristic selection is only analyzed by a single maximum information coefficient method;

and (c) H-SVR: the method adopts SVR, and the feature extraction method is adopted for feature selection;

and (v) SVR: the method adopts SVR, and the feature selection only adopts a single maximum information coefficient method for analysis;

to avoid the effect of randomness, each model was averaged 10 times. Table 3 lists the evaluation indices for the eight model predictions. The evaluation indexes adopt Root Mean Square Error (RMSE), average absolute error percentage (MAPE) and average absolute error (MAE), and the smaller the values of the indexes are, the higher the prediction precision is. As can be seen from Table 3, the prediction accuracy of H-ILSTM is higher than that of ILSTM, which shows that the feature extraction method of the present invention improves the prediction accuracy. The prediction precision of H-ILSTM is higher than that of H-LSTM, and The Time (TT) required by training is short, which shows that the ILSTM of the method is better than the standard LSTM. The difference in prediction accuracy for the 8 models can be seen more clearly in fig. 4.

TABLE 3 eight model prediction index Table

The foregoing shows and describes the general principles and features of the present invention, together with the advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A power load prediction method based on an improved long-short term memory network is characterized by comprising the following steps:

step one, preliminarily screening the characteristics of historical loads according to the historical power loads as the input of a model;

secondly, considering relevant factors influencing the power load, and performing secondary screening on the historical load by adopting a maximum relevant minimum redundancy method;

and step three, taking the screened final feature set as the input of the model, and predicting by adopting an improved long-short memory network to obtain a final prediction result.

2. The method for predicting the power load based on the improved long-short term memory network as claimed in claim 1, wherein: in the first step, the historical power load is analyzed by adopting the maximum information coefficient MIC, and the historical load with the maximum information coefficient larger than 0.6 is used as a preselected set [ x [)₁,x₂,...,x_k]Remaining unselected historical loads as an alternative [ c ]₁,c₂,...,c_k]。

3. The method for predicting the power load based on the improved long-short term memory network as claimed in claim 1, wherein: in step two, collecting data of relevant factors influencing the power load comprises temperature [ TI₁,TI₂,...,TI_k]Humidity [ HI ]₁,HI₂,...,HI_k]Dew point [ DI₁,DI₂,...,DI_k]Date category, time period category; the date type and the time section type are discrete characteristics and are coded; that is, discrete data is converted into a number between 1 and n, where n is the number of different values of a list, and can be considered as the number of all different values of a certain feature; and further analyzing the relevant factors influencing the power load by adopting a feature extraction method to obtain a feature set.

4. The improved long-short term memory network based power load prediction method as claimed in claim 3, wherein: the characteristic extraction method adopts a wrapping type characteristic selection method to carry out secondary screening on the historical load; analyzing the historical load under a preselected set and the historical load under an alternative set by adopting a maximum correlation minimum redundancy method to obtain the characteristic of the maximum coefficient, putting the characteristic into the preselected set, and combining with other correlation factors influencing the load to form a primary training set TR (x)^tr,Y]And a test set TE ═ x consisting of predictor factors only^te]And carrying out normalization processing on the data; and training the preliminary training set by adopting an improved long and short memory network, comparing the obtained prediction result with the test set, and judging whether to continue adding the features to the preselected set or not by adopting the root mean square error as a threshold value.

5. The improved long-short term memory network based power load prediction method as claimed in claim 4, wherein: and when the root mean square error RMSE is adopted for judgment, the RMSE obtained at the previous time of adding the features is taken as a record, the RMSE obtained after adding the features is compared with the previous time, and if the RMSE becomes larger after adding the features, the addition of the features is stopped.

6. The method for predicting the power load based on the improved long-short term memory network as claimed in claim 1, wherein: in the third step, the step and the calculation formula of the information forward propagation of the improved long and short memory network ILSTM in the t-th time period are as follows:

u_t＝σ(net_ut) (2)

o_t＝σ(net_ot) (7)

h_t＝o_t*tanh(c_t) (8)

z_t＝W_y·h_t+b_y (9)

y_t＝σ(zt) (10)

in the above formula, net_ut,net_ct,net_otAnd z_tIs the state of the current stage of the t period; w_u,

And W_oTheir weight matrices, respectively; b_u,

And b_oRespectively represent their deviation vectors; u. of_t,

7. The improved long-short term memory network based power load prediction method as claimed in claim 6, wherein: in the ILSTM, the steps and calculation formula of information back propagation in the t-th period are:

a. defining the most common square error function as the target to be optimized

b. Calculating errors of output layers

c. Calculating errors of hidden layers

W＝W-η·W (27)。

8. The improved long-short term memory network based power load prediction method as claimed in claim 6, wherein: in the ILSTM, the step of updating the weights of the tth period includes:

using Adam optimization algorithm with [ w ]_h,w_x,b]And [ w_y,b_y]To update [ w_h,w_x,b]And [ w_y,b_y](ii) a To better illustrate the update process, the weight is denoted by the symbol W, the gradient of the weight is denoted by W, and the general formula for Adam updating the weight is:

m_t＝β₁·m_t-1+(1-β₁)·W_t (28)

v_t＝β₂·v_t-1+(1-β₂)·(W_t)² (29)

wherein E_tAs an error function, y_tAnd Y_tAre respectively predictionA value and an observed value; m is_tAnd

is the deviation of the first moment estimated and corrected, v_tAnd

is the deviation of the biased second moment estimation and correction second moment; beta is a₁,β₂And Adam's parameters, default to 0.9,0.999 and 10, respectively^-8(ii) a Eta represents learning efficiency;