CN113743668B

CN113743668B - Household electricity-oriented short-term load prediction method

Info

Publication number: CN113743668B
Application number: CN202111045733.3A
Authority: CN
Inventors: 殷波; 杜泽华; 魏志强; 崔永超
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2024-04-05
Anticipated expiration: 2041-09-07
Also published as: CN113743668A

Abstract

The invention discloses a short-term load prediction method for household electricity, which introduces a residual mechanism into an LSTM network to construct a residual LSTM module, and introduces a Scaled Dot-Product Attention mechanism into a decoding process to construct an Encoder-Decoder model; according to the invention, the fuzzy clustering algorithm is used for extracting similar daily data, and the data are normalized, so that the problems of large similarity and non-uniform dimension among the data are solved; and aiming at the problem of long input sequence information and correlation loss between sequences, data is input into an Encoder-Decoder model combined with Scaled Dot-Product Attention, so that the weights of elements inside intermediate codes relied on by each time of the Decoder on output are different, and the influence of key factors is highlighted.

Description

Household electricity-oriented short-term load prediction method

Technical Field

The invention belongs to the technical field of load prediction, and particularly relates to a household electricity-oriented short-term load prediction method.

Background

The electric load prediction may be classified into a short-term load prediction and a medium-long term load prediction. Short-term load prediction refers to daily load prediction and weekly load prediction, and is mainly used for short-term power grid operation mode arrangement, static safety analysis, scheduled maintenance arrangement and the like.

In the field of short-term load prediction, there are methods such as artificial neural networks (Artificial Neural Networks, ANN), wavelet transforms (Wavelet Transform), fuzzy Logic (FL), and combined prediction methods. The method of load prediction gradually converts to a prediction sequence, the model is converted into a Sequence to Sequence model with a attention mechanism from a single LSTM model, and the LSTM network encodes all input features into a vector representation with fixed length, neglects the size of the correlation between the input features and the load to be predicted, and cannot use the historical data with emphasis. And adding an attention mechanism into the prediction model, and distributing more attention to important data by calculating the attention weights of different input quantity characteristics, so that the load prediction accuracy is improved. The neural network algorithm has slow learning and convergence speed, is easy to fall into a local minimum value, and has the condition of non-convergence; the wavelet transform method has troublesome selection of decomposition scale and wavelet basis; the fuzzy logic algorithm mapping is not thin enough and has weak learning ability; the combined model is difficult to tune, and the weight of each algorithm is difficult to determine.

In the short-term power load prediction, by observing the load curve, it is found that when the load is at the "inflection point" (e.g., about 7 a.m. on the workday, the actual load value greatly increases), and the prediction accuracy of the load predicted by the conventional method is low, sometimes only about 90%. The load curve of the same time period of similar days has little change, and the load is more in a similar change rule in the same time period of the latest several same types of days, so that the traditional method is difficult to accurately predict.

In the conventional Encoder-Decoder structure, the Encoder encodes all input sequences, regardless of length or length, into a fixed length semantic feature c for re-decoding, which results in two problems:

(1) Information loss for long input sequences;

(2) Original structural information of the sequence is lost, interrelation information between sequence data is ignored, and accuracy is reduced due to the two information.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention provides a short-term load prediction method for household electricity, which is based on an LSTM network combined with a Scaled Dot-Product Attention mechanism, and introduces an Attention mechanism into an Encoder-Decoder model to effectively highlight factors influencing load and solve the problems of time sequence and nonlinear regression existing in load data of a power system, thereby improving the prediction effect.

In order to solve the technical problems, the invention adopts the following technical scheme:

a short-term load prediction method for household electricity uses introduces a residual mechanism into an LSTM network to construct a residual LSTM module, and introduces a Scaled Dot-Product Attention mechanism into a decoding process to construct an Encoder-Decoder model, and the specific method comprises the following steps:

step 1: acquiring historical load data;

step 2: data preprocessing: extracting similar day data through an FCM fuzzy clustering algorithm, and carrying out normalization processing on the data;

step 3: input to the residual LSTM module: adding the input and the output together and then activating the input and the output through a Dropout function;

step 4: attention layer dot product operation: data is input into an Encoder-Decode model which combines a Scaled Dot-Product Attention mechanism, so that the weights of elements inside intermediate codes relied on by each moment of the Decode are different; wherein the Attention is added to the intermediate result of the Decoder;

step 5: ffnn+softmax optimization: after the Attention layer, an FFNN+Softmax layer is introduced for optimization, the result obtained by calculation of the Attention layer is transmitted to the FFNN+Softmax, and the FFNN is trained by a Levenberg-Marquardt algorithm.

Further, in step 2, the specific implementation steps are as follows:

(1) Similar day data extraction: in FCM, a certain data is not limited to a specific cluster, but can be respectively affiliated to a plurality of clusters according to different membership degrees, thus, a data set x= { X containing n elements is given ₁ ,x ₂ ,x ₃ ,…,x _n -it needs to be decomposed into c fuzzy clusters, minimizing the objective function as:

wherein m is any real number greater than 1; u (u) _ij Is x _i Membership to the jth cluster; x is x _i Is the i-th element in the set X, and the dimension is d dimension; c _j Is the center of the j-th cluster; ii is an algorithm that calculates the distance of the data from the cluster center;

FCM is a method for optimizing membership u by continuously iterating the objective function _ij And cluster center c _j To accomplish this, the iterative expression is:

let ε be the threshold of the iterative process when it is satisfied thatAt the end of the iterative process, the process is considered to converge to a local minimum point J;

(2) Input data normalization

Because of dimensional differences among different dimensions of input data, the model training and prediction effects are affected, normalization processing is adopted on the data, the data is mapped to the [ -1,1] interval, and the formula is as follows:

in which x is _max 、x _min The maximum and minimum values of the variables, respectively.

Further, the input data X and the LSTM output data F (X) are added together to obtain F (X) +X, and then activated.

Further, in the Encoder-Decode model of step 4, where there are two LSTM layers in the Decode, the position of the addition of the Attention is between the two LSTM hidden layers, and the Attention is given to the second LSTM layer along with the output of the first LSTM layer.

Further, the equation of the Attention mechanism is as follows:

q represents a query term, K represents a calculated attention term, V represents a value of the calculated attention term, and d represents a value of the calculated attention term _k As a normalization, d _k Representing the dimension of the calculated attention item K; determining the weight distribution of the value V of the calculated attention item by inquiring the similarity degree of the term Q and the calculated attention item K, and dividing the weight distribution by a scaling factor to reduce the weight distribution at d _k When the dot product is large, the influence caused by the large dimension of the result is large;

the dimension of the load sequence data is d, the size of Q is n×d, K, V is the same, and the ith load sequence data is v _i The Attention of this sequence should be:

and simultaneously making the attitudes of all sequences, wherein the attitudes are as follows:

further, after the FFNN network in step 5, a residual connection and a peer layer are added to fix the mean and variance of the inputs of neurons in one layer, so as to reduce the influence of the change of the output of the peer layer on the input of the next layer.

Compared with the prior art, the invention has the advantages that:

(1) The invention extracts data of similar days by using a fuzzy clustering algorithm (FCM), thereby solving the problem of small load curve change (namely large similarity between data) in the same period and obviously improving the precision of a prediction result. Because of the dimensional difference among different dimensions of the input data, the model training and prediction effects can be affected, the data are mapped to the [ -1,1] interval by adopting normalization processing, the problem of non-uniform data dimensions is solved, and the accuracy of short-term load prediction is improved.

(2) The invention introduces an attribute mechanism into the Encoder-Decoder model, and the attribute gives different weights to the input features of the Encoder-Decoder model, highlights more key influencing factors, helps the model to make more accurate judgment, and does not increase the calculation and storage cost of the model; the Scaled Dot-Product Attention is introduced into the Encoder-Decoder model to highlight the influence of key factors, so that global connection is captured in one step, the problem of long-distance dependence is solved, and the prediction effect is improved.

(3) After the Attention layer, an FFNN+Softmax layer is introduced for optimization, so that the predicted result is more accurate.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a schematic diagram of an LSTM residual module according to the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific examples.

In combination with the short-term load prediction flow combined with Scaled Dot-Product Attention based on LSTM shown in FIG. 1, the invention provides a short-term load prediction method for household electricity, which introduces a residual mechanism into an LSTM network to construct a residual LSTM module, and introduces a Scaled Dot-Product Attention mechanism into a decoding process to construct an Encoder-Decoder model, and the specific method comprises the following steps:

step 1: acquiring historical load data:

the present example selects a dataset from the entsu-E platform containing a sequence of actual and predicted loads per hour from 2015, 1 to 2017, 5 in switzerland, a sequence of hours of temperature (in °f) and a map of qualitative weather in one of 3 categories defined in the profile.

Step 2: data preprocessing:

and extracting similar day data through an FCM fuzzy clustering algorithm, and carrying out normalization processing on the data.

(1) Similar day data extraction: in FCM, certain data is not limited to a particular cluster, but may be respectively affiliated to a plurality of clusters according to different membership degrees. Thus, given a data set x= { X containing n elements ₁ ,x ₂ ,x ₃ ,…,x _n If it needs to be decomposed into c fuzzy clusters, then FCM is to minimize the objective function, which is:

wherein m is any real number greater than 1; u (u) _ij Is x _i Membership to the jth cluster; x is x _i Is the i-th element in the set X, and the dimension is d dimension; c _j Is the center of the j-th cluster; ii is an algorithm that calculates the distance of the data from the cluster center.

Here, FCM optimizes membership u by performing continuous iteration on the objective function _ij And cluster center c _j To accomplish this, the iterative expression is:

let ε be the threshold of the iterative process when it is satisfied thatWhen the iteration process is finished, the above-mentioned processIs considered to converge to a local minimum point J.

In this embodiment, the threshold is set to epsilon=0.5, and the category is set to c=5.

(2) Input data normalization

Because of dimensional differences among different dimensions of input data, the model training and prediction effects are affected, normalization processing is adopted on the data, and the data is mapped to the [ -1,1] interval. The formula is as follows:

Step 3: input to the residual LSTM module: the input and output are added together and then activated by the Dropout function.

For LSTM networks, identity mapping is not easily fit. The basic idea of the residual network is therefore introduced in the present invention to solve this problem. The residual unit may be implemented as a layer-jump connection, i.e. the input data X and the LSTM output data F (X) are added together to obtain y=f (X) +x, which is then activated. The Dropout function is used as the activation function in the present invention. LSTM incorporating the residual mechanism can be easily implemented with the mainstream automatic differential deep learning framework, directly using BP algorithm to update parameters, as shown in fig. 2.

Experiments show that the residual error mechanism well solves the degradation problem of the LSTM network.

Step 4: attention layer dot product operation:

the data is input into the Encoder-Decoder model, which incorporates the Scaled Dot-Product Attention mechanism, such that each time instant of the Decoder depends on the weight of the element inside the intermediate code on the output. The invention highlights the impact of key factors by introducing Scaled Dot-Product Attention into the Encoder-Decoder model.

Wherein the Attention is added to the intermediate result of the Decoder; two LSTM layers are included in the Decoder, and the position of the addition of the Attention is between the two LSTM hidden layers, and the Attention is given to the second LSTM layer together with the output of the first LSTM layer.

The equation for the Attention mechanism is as follows:

q represents a query term, K represents a calculated attention term, V represents a value of the calculated attention term, and d represents a value of the calculated attention term _k As a normalization, d _k Representing the dimension of the calculated attention item K, 64 is used by default.

Determining the weight distribution of the value V of the calculated attention item by inquiring the similarity degree of the term Q and the calculated attention item K, and dividing the weight distribution by a scaling factor to reduce the weight distribution at d _k The dot product results in large dimensions have a large impact.

The reason for using the scaling factor is that for d _k When large, the dot product results in a large dimension, resulting in a region where the softmax function gradient at the result is small. And in the case of small gradients, counter-propagation is not favored. To overcome this negative effect, divided by a scaling factor, this situation can be alleviated to some extent.

For the traditional model, the hidden layer is absent in the encoding process, and in the invention, the hidden layer is replaced by the sequence data after data preprocessing to perform the operation of the attention mechanism. So Q this matrix is made up of sequences of payload data, as is K, V. When a payload data sequence is used to "query" its degree of match with any payload data sequence, i.e., the magnitude of attention, a total of n rounds of such operations are performed.

if all sequences of attitudes are done at the same time, attitudes are:

step 5: ffnn+softmax optimization: after the Attention layer, an FFNN+Softmax layer is introduced for optimization, the result obtained by calculation of the Attention layer is transmitted to the FFNN+Softmax, the precision of load prediction is further improved, and the FFNN network is trained by a Levenberg-Marquardt algorithm.

The Attention after adding FFNN may be formalized as:

e _t ＝a(h _n )

wherein a (h _n ) The training function can be regarded as a feed forward network, and the formula function can refer to the prior art and is not repeated.

For better optimization of the depth network, a residual connection and a peer (Layer Normalization) are added after the FFNN network (Add & Norm).

Note that a change in one layer output will produce a high correlation change in the next layer input, especially when its output changes significantly. The effect of covariate shift can be reduced by fixing the input mean and variance of a layer of neurons. Layer normalization is thus introduced to reduce the effect of changes in the output of this layer on the input of the next layer.

In summary, the present invention provides a short-term load prediction method in home power consumption environment combined with Scaled Dot-Product Attention based on LSTM, which uses fuzzy clustering algorithm (FCM) to extract similar daily data, normalizes the data, and inputs the data to an Encoder-Decoder model combined with Scaled Dot-Product Attention for solving the problem of correlation loss between long input sequence information and sequences, so that the weights of the elements in the intermediate codes on which each moment of the Encoder depends on output are different. The results show that the invention has the main advantages that:

(1) The method extracts the data of similar days and normalizes the data, solves the problems of large similarity and non-uniform dimension between the data, and improves the accuracy of short-term load prediction.

(2) The relevance between the data is processed, the influence of the key factors is highlighted by introducing Scaled Dot-Product Attention into the Encoder-Decoder model, the global relation is captured in one step, and the problem of long-distance dependence is solved.

It should be understood that the above description is not intended to limit the invention to the particular embodiments disclosed, but to limit the invention to the particular embodiments disclosed, and that various changes, modifications, additions and substitutions can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A short-term load prediction method for household electricity is characterized in that a residual error mechanism is introduced into an LSTM network to construct a residual error LSTM module, and a Scaleddot-Product Attention mechanism is introduced into a decoding process to construct an Encoder-Decoder model, and the method comprises the following steps:

step 1: acquiring historical load data;

step 2: data preprocessing: extracting similar day data through an FCM fuzzy clustering algorithm, and carrying out normalization processing on the data; the specific implementation steps are as follows:

(1) Similar day data extraction: given a dataset x= { X containing n elements ₁ ,x ₂ ,x ₃ ,…,x _n -it needs to be decomposed into c fuzzy clusters, minimizing the objective function as:

(2) Input data normalization

The normalization processing is adopted for the data, the data is mapped to the [ -1,1] interval, and the formula is as follows:

in which x is _max 、x _min Respectively the maximum value and the minimum value of the variable;

step 3: input to the residual LSTM module: adding the input data X and the LSTM output data F (X) together to obtain F (X) +X, and activating the F (X) +X through a Dropout function;

step 4: attention layer dot product operation: data is input into an Encoder-Decode model which combines a Scaled Dot-Product Attention mechanism, so that the weights of elements inside intermediate codes relied on by each moment of the Decode are different; wherein the Attention is added to the intermediate result of the Decoder; in the Encoder-Decoder model of the step 4, if two LSTM layers exist in the Decoder, the added position of the attribute is between the hidden layers of the LSTM layers, and the attribute and the output of the LSTM of the first layer are given to the LSTM of the second layer together;

the equation of the Attention mechanism is as follows:

the dimension of the load sequence data is d, the size of Q is n×d, the size of K, V is n×d, and the ith load sequence data is v _i The Attention of this sequence should be:

2. The household electricity oriented short-term load prediction method according to claim 1, wherein after the FFNN network of step 5, a residual connection and a peer layer are added to fix the mean and variance of the inputs of neurons of one layer, and reduce the influence of the change of the output of the peer layer on the input of the next peer layer.