CN111222689A

CN111222689A - LSTM load prediction method, medium, and electronic device based on multi-scale temporal features

Info

Publication number: CN111222689A
Application number: CN201911244260.2A
Authority: CN
Inventors: 杨梅; 王仕发; 牛晓伟; 谢辉
Original assignee: Chongqing Three Gorges University
Current assignee: Chongqing Three Gorges University
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-06-02

Abstract

The invention discloses a Long Short-Term Memory (LSTM) load prediction method (marked as W _ LSTM), a medium and an electronic device based on multi-scale time characteristics. The result of a comparison experiment of the method provided by the invention with a standard LSTM model, a Self-Organizing mapping network (SOM) and a Gaussian Process Regression (GPR) shows that the method provided by the invention has higher prediction precision. Meanwhile, experiments on the data of the noise load prove that the method also has certain anti-noise performance.

Description

LSTM load prediction method, medium, and electronic device based on multi-scale temporal features

Technical Field

The invention relates to a computer algorithm technology, in particular to an LSTM load prediction method, a medium and an electronic device based on multi-scale time characteristics.

Background

The power load data is the basis of reliable economic operation and scientific intelligent management of the power grid. Under the pressure of energy conservation and emission reduction and the promotion of an open power market, accurate load prediction has important significance on unit optimization, equipment maintenance, economic dispatching and the power market. However, the application of large-scale intermittent energy power generation systems and access and demand response mechanisms of electric vehicles causes high randomness and dynamic variability of power loads, and challenges are brought to load prediction. Meanwhile, the construction of the ubiquitous Internet of things and massive multi-source historical data provide a data basis for a new method for load prediction; therefore, load forecasting has new opportunities and challenges in the context of smart grids and artificial intelligence.

Because the traditional machine learning method cannot guarantee the dynamic load prediction precision under the background of high randomness and big data, a large number of researchers start turning to the deep learning-based method. Deep learning, as a branch of machine learning, has a deeper architecture (the number of network layers is 3 or more) and higher prediction accuracy. For example, a deep confidence network and a convolutional neural network are used for load prediction, and have higher prediction accuracy under the conditions that a training sample is large and load influence factors are complex, but have certain defects, such as easy occurrence of overfitting and no consideration of time correlation of power load. Therefore, a Recurrent Neural Network (RNN) model based on time series characteristics is used for load prediction, but it can only memorize short time series data, and as the data volume increases and the time interval increases, the RNN loses important information of a previous person, so that the gradient disappears and the prediction model fails. For this problem of RNN, LSTM is proposed, which has higher accuracy than the hyper-parametric optimized machine learning model. The gated round-robin unit model, after extrapolation, is also used for load prediction, with reduced model parameters compared to LSTM, but with somewhat poorer performance than LSTM.

Compared with single-factor load prediction, more and more researchers consider load prediction methods of composite influence factors. Besides historical load, other relevant factors such as date factors, meteorological factors, economic factors and the like are introduced into input data, so that more reliable and accurate prediction is achieved. This can improve accuracy to some extent, but the collection of such data requires additional effort and the introduction of additional variables may negatively impact predictive performance. For example, these variables are highly correlated, so the supervising algorithm may not be able to identify the true relationship of different time steps. Moreover, introducing too many input variables will greatly increase the complexity of the model, which will generally increase the risk of overfitting, increase the computational burden, and slow the convergence rate. The wavelet decomposition can realize the decomposition of the load data on different time scales, and the wavelet decomposition is carried out on the historical load data to express the load sequence components under the influence of different factors. LSTM has longer memory than RNN, but can only remember about 100s of long-term information. The literature decomposes photovoltaic sequences into different sequence components using wavelet transforms, reconstructs each sequence after prediction, which increases the accumulation of errors and is time consuming by reconstructing each sequence after prediction.

Disclosure of Invention

The technical problem to be solved by the invention is to provide an LSTM load prediction method, medium and electronic device based on multi-scale time characteristics, so as to solve the problems that the existing load prediction means for predicting and reconstructing each sequence increases error accumulation, the short-term load prediction precision is low and the prediction model is poor in robustness.

The invention is realized by the following technical scheme:

the LSTM load prediction method based on the multi-scale time characteristics comprises the following steps:

step 1: normalizing the original load sequence;

step 2: carrying out wavelet decomposition and reconstruction on the normalized original load sequence to obtain load sequences with different time characteristics;

and step 3: recombining the load sequences with different time characteristics to obtain a recombined sequence;

and 4, step 4: extracting data with greater relevance and diversity from the recombined sequences by a determinant point process;

and 5: and inputting the extracted data with larger correlation and diversity as input data into a trained LSTM load prediction model for load prediction.

Further, the wavelet is dbN wavelet.

Further, the vanishing moment N of the dbN wavelet is 15.

Further, the step 2 specifically includes performing 4-level wavelet decomposition and reconstruction on the normalized original load sequence to obtain 5 load sequences with different time characteristics.

Further, the LSTM load prediction model is trained through an adaptive moment estimation optimization algorithm.

A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the LSTM load prediction method as described above.

An electronic device comprising a memory, a processor and a computer program stored in the memory and executable in the processor, the processor implementing the LSTM load prediction method as described above when executing the computer program.

Compared with the prior art, the LSTM load prediction method, the medium and the electronic device based on the multi-scale time characteristics, provided by the invention, have the advantages that the original load sequence is decomposed through wavelet decomposition to obtain the load sequences with different time characteristics, so that the signal change rule is subjected to deep analysis in the subsequent process, data with high relevance and diversity are extracted through a determinant point process to realize data dimension reduction so as to improve the training speed, and finally the data are input into an LSTM load prediction model to realize load prediction. Compared with a standard LSTM model, the method provided by the invention has higher prediction precision. Meanwhile, experiments on the data of the noise load prove that the method also has certain anti-noise performance.

Drawings

FIG. 1 is a schematic diagram of the structure of an LSTM cell;

FIG. 2 is a schematic diagram of a network structure of a layer 2 LSTM employed in an embodiment of the present invention;

FIG. 3 is a schematic diagram of the LSTM load prediction model structure and the working principle according to the embodiment of the present invention;

FIG. 4 is a schematic general flowchart of an LSTM load prediction method based on multi-scale temporal features according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a load sequence with different time characteristics according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a wavelet decomposition and sequence recombination implementation process;

FIG. 7 is a schematic diagram of data extraction by a routine punctuation process implemented in accordance with the present invention;

FIG. 8 is a schematic diagram of a training loss reduction curve comparison between the LSTM load prediction method based on multi-scale time characteristics and a standard LSTM;

FIG. 9a is a graph showing the results of Elia data set prediction comparison tests (load prediction before one hour) of the LSTM load prediction method based on multi-scale time characteristics and other methods provided by the embodiment of the present invention;

FIG. 9b is a diagram showing the results of Elia data set prediction comparison test (load prediction before day) based on LSTM load prediction method of multi-scale time characteristics and other methods provided by the embodiment of the present invention;

fig. 10 is a schematic diagram of evaluation indexes of the LSTM load prediction method based on multi-scale temporal features according to different proportions of noise according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the following embodiments and the accompanying drawings.

LSTM was proposed to solve the problem of gradient disappearance of RNN appearance, and is applied in many time series data scenarios, and the structure of LSTM unit is shown in fig. 1. Each LSTM unit contains a forgetting gate, an input gate, and an output gate, with three input data: output h of last moment of previous LSTM unit_t-1State s of the last moment of the current LSTM cell_t-1And current time input x_t. The forgetting gate is used for calculating the forgotten information, the input gate is used for calculating the information stored in the state unit, the output gate calculates the information required to be output through a sigmoid function, and the information is multiplied by the value of the current unit state through a tanh function to obtain the output. The LSTM unit algorithm implements mathematical expressions of equations (1) - (6):

f_t＝σ(W_fx·x_t+W_fh·h_t-1+b_f) (1)

i_t＝σ(W_ix·x_t+W_ih·h_t-1+b_i) (2)

g_t＝tanh(W_cx·x_t+W_ch·h_t-1+b_c) (3)

s_t＝f_t*s_t-1+i_t*g_t(4)

o_t＝σ(W_ox·x_t+W_oh·h_t-1+b_o) (5)

h_t＝o_ttanh(s_t) (6)

in formulae (1) to (6): f. of_tIs a state of forgetting to leave the door i_t、g_tIs the state of the input gate, o_tIs the state of the output gate, s_t、h_tThe state and output of the current LSTM cell, respectively. W_fx、W_fh、W_ix、W_ih、W_cx、 W_ch、W_ox、W_ohRespectively corresponding gate state and input x_tAnd the output h of the last-in-time LSTM cell_t-1The weights of the multiplications, σ, denote sigmoid activation functions, and σ denotes the bitwise multiplication of elements in the vector.

The inputs to the LSTM network output layer are:

the LSTM network outputs are:

the present invention employs a 2-layer LSTM network, the structure of which is shown in fig. 2.

The structure of the LSTM load prediction model based on the multi-scale time characteristics is shown in FIG. 3. The model mainly comprises two parts: data feature engineering and LSTM networks. The data characteristic engineering realizes the acquisition of load sequence components on different time scales; the LSTM network implements load prediction.

Based on the above basis, the embodiment of the invention provides an LSTM load prediction method based on multi-scale time characteristics. As shown in fig. 4, the LSTM load prediction method generally includes the steps of:

step S1: normalizing the original load sequence;

step S2: carrying out wavelet decomposition and reconstruction on the normalized original load sequence to obtain load sequences with different time characteristics;

step S3: recombining the load sequences with different time characteristics to obtain a recombined sequence;

step S4: extracting data with greater relevance and diversity from the recombined sequences by a determinant point process;

step S5: and inputting the extracted data with larger correlation and diversity as input data into a trained LSTM load prediction model for load prediction.

In step S1, in order to improve the convergence speed and accuracy of the model, the values of the load data cannot differ too much, so that the original load sequence needs to be normalized, and the normalization formula of the original load sequence in the embodiment of the present invention is:

in the formula (9), x is a normalized value,

is the average of the load data and,

the number of data bits-1.

In step S2, each component with characteristic difference in the signal can be effectively separated by wavelet decomposition, so as to obtain a stationary sequence and non-stationary sequences on different scales. Commonly used wavelets are Haar wavelets, Daubeches (dbN) wavelets, Mexican Hat wavelets, Meyer wavelets, and the like. According to the embodiment of the invention, dbN wavelets are selected, the vanishing moment N of dbN wavelets is selected to be 15 through experiments, and the normalized original load sequence is subjected to 4-level wavelet decomposition and reconstruction. That is, in this embodiment, step S2 is specifically to perform 4-level wavelet decomposition and reconstruction on the normalized original load sequence, and obtain 5 load sequences with different time characteristics as shown in fig. 5. As can be seen from fig. 5, in the obtained 5 load sequences (components) with different time characteristics, the level 3 detail component and the level 4 detail component are relatively stable, and the level 4 approximate component reflects the load trend, such as the change of seasons and the load increase caused by the economic and industrial development of the area, and can reflect the change law of the load for a long time; the layer 4 detail component represents a fixed invariant portion of the load; the fluctuation of the detail component of the layer 3 can reflect the growing intensity of the trend load; the fluctuation of detail components of layer 1 and layer 2 is large, and represents the magnitude and duration of peak-valley value on the load curve to a certain extent, and can reflect the daily load time characteristic.

The wavelet decomposition and sequence reassembly implementation processes described in steps S2 and S3 are shown in fig. 6, where M is the number of load sequence data, 4-layer wavelet decomposition is employed to obtain 5 data sequences with different characteristics, and the data sequences are marked as H₁＝{h₁₁,h₁₂,…,h_1M}、H₂＝{h₂₁,h₂₂,…,h_2M}、H₃＝{h₃₁,h₃₂,…,h_3M}、H_4.d＝{h_41.d,h_42.d,…,h_4M.d}、 H_4.a＝{h_41.a,h_42.a,…,h_4M.a}. The input required for LSTM is sequence data, which requires the reassembly of 4 sequences in a time relationship, the reassembled sequence being H_r＝{h₁₁,h₂₁,h₃₁,h_41.d,h_41.a；h₁₂,h₂₂,h₃₂,h_42.d,h_42.a；…；h_1M,h_2M,h_3M,h_4M.d,h_4M.a}. Increasing the number of layers of the LSTM may improve the predictive power of the model, but too many may also cause an overfitting phenomenon. According to the quantity of the training data, the number of layers of the model and the quantity of neurons in each layer are selected through an experimental method.

In step S4, the determinant point process defines a discrete set D ═ D₁,d₂,d₃,…,d_nProbability distribution of each subset Y of }: given the probability of occurrence of the null set, there is a real symmetric semi-positive definite matrix L ∈ R consisting of the elements of the set D^n×nFor each subset Y of the set D, the probability P (Y) that the subset Y appears, oc det (L)_Y)。

L in the formula (10)_YA submatrix representing a matrix L composed of rows and columns having subscripts belonging to Y, and det () representing a value of a determinant. I is as large as R^n×nIs an identity matrix. Since L is a semi-positive definite matrix, there is a matrix V such that L ═ VV^T，V∈R^n×r. Likewise, L_YCan be decomposed into

In practical application, each column vector V in the matrix V is used_iThe decomposition is as follows:

V_i＝r_if_i(11)

then there are:

L_ij＝〈V_i,V_j〉＝〈r_if_i,r_jf_j〉＝r_ir_js_ij(12)

in the formulae (11) and (12), r_iData d of not less than 0_iCorrelation with the data set D. s_ij＝〈f_i,f_jIs d_iAnd d_jThe more similar the similarity information is, the smaller the value is, and the | f_i||₂＝1；<>Representing a vector cross product.

As can be seen from equation (12), the greater the correlation between the data and the data set, and the richer the data diversity, the larger the determinant of the matrix L. Therefore, by finding L with the largest determinant value_YThe optimal dimensionality reduction data subset Y can be obtained_mThe specific calculation formula is as follows:

Y_m＝argmax{det(L_Y)} (13)

data extracted by using a determinant point process algorithm is shown in fig. 7, wherein + represents original data, solid dots represent extracted data with large correlation and diversity, the extracted data are used as input data and input into a trained LSTM load prediction model to realize load prediction, and a load prediction result is output.

In terms of model training, the embodiment of the present invention adopts an ADAM (Adaptive motion Estimation) optimization algorithm to train the LSTM load prediction model based on the multi-scale time features. The method is an algorithm which combines the Momentum algorithm and the RMSProp algorithm, and has the characteristics of high convergence rate and small fluctuation amplitude. The loss function is minimized by iteratively updating the weights and biases of the network nodes. The loss function uses Mean Square Error (MSE):

in the formula (14), n is the number of data, y_iIs the actual value of the load and,

is the corresponding predicted value.

The number of layers of the LSTM network is 2, the number of neurons in the first layer is 50, the number of neurons in the second layer is 100, and the number of iterations is 200. To avoid overfitting, the value of dropout is set to 0.2. The comparison between the LSTM load prediction method based on multi-scale time characteristics and the training loss decline curve of the standard LSTM is shown in FIG. 8: when the training times reach 150 times, the loss is basically stable and does not decrease, so the number of the training times is 150. As can be seen from FIG. 8, the training loss of the LSTM load prediction model based on multi-scale temporal features of the invention is smaller than that of the conventional standard LSTM.

The technical effect experiment analysis of the invention is as follows:

experimental hardware environment and experimental data

The hardware adopts a computer with a processor of Intel (R) core (TM) i3, a main frequency of 3.30GHZ and a memory of 12G. Under the Tensorflow framework, the programming is done using the python language. The invention performs experiments on load data (referred to as Elia data set for short) disclosed by Elia of belgium electric network operators. Elia load data was collected every 15 minutes from 2004 to 2014, and the present invention was uniformly selected as 24 points per day. Training samples are load data of one to two years, the first 95% are sample data, and the last 5% are test data. The processing time scheduled in each country differs, for example one hour ahead in the uk and one day ahead in china, so the experiments of the present invention are load prediction one hour before and one day before, respectively. In the one-hour pre-load prediction experiment, each training sample was constructed by: 24 load data of the first 24 hours are taken as input, and 25 th data are taken as output; in the day-ahead load prediction experiment, the data of each training sample is as follows: 168 load data on the first 7 days were input and 24 data on day 8 were output.

In order to prove the effectiveness of the technical scheme of the invention, the prediction results of the SOM, LSTM and GP models are compared with the method of the invention. The experiments are divided into two groups, wherein the first group is a comparison experiment of load prediction before one hour and load prediction before day on an Elia data set; in the second group, 1% -3% of noise is added into the Elia data set to perform an anti-noise performance test experiment.

Evaluation index of experiment

In order to evaluate the prediction performance of the technical scheme of the invention, the average absolute percentage error E is selected_MAPERoot mean square error E_RMSEAnd determining three indexes of a coefficient (R Squared, RS), wherein the expression is as follows:

the coefficient is determined by representing the quality of fitting through the change of data, the value range is 0-1, and the closer to 1, the better the model fits the data. Is the ratio of the Sum of Squares of Regression (SSR) to the Sum of Squares of Total (SST).

Formula (II)

In, y_iIs the actual value of the load,

in order to predict the value of the load,

is the average value of the actual load, and n is the number of data.

Analysis of Experimental results

Experiment on Elia data set:

fig. 9 shows predicted load and actual load for three days in 2013, 12, wherein fig. 9(a) shows the result of the one-hour load prediction experiment and fig. 9b shows the result of the one-day load prediction experiment. By combining fig. 9(a) and fig. 9(b), the W _ LSTM of the present invention is superior to other algorithms for load prediction one hour ago or day ago. With the increase of the prediction period, the relevance of data on time is reduced, so that the accuracy of load prediction is reduced, and the prediction precision is reduced compared with other methods because the LSTM load prediction method based on the multi-scale time characteristics utilizes different time scale characteristics of wavelets.

The following table is an evaluation index of prediction accuracy on Elia data set. It can be seen that the method of the present invention performs optimally under three evaluation indexes. Compared with Self-Organizing mapping (SOM), Gaussian Process Regression (GPR) and LSTM, in the one-hour pre-load prediction experiment, E is a measure of the load_MAPERespectively increased by 2.308, 1.474 and 1.425; e_RMSE106.085, lower by 265.597, 169.165, 155.4, respectively; the RS is 0.980, which is improved by 0.227, 0.115 and 0.102 respectively. In the experiment for predicting the load before day, E_MAPE3.741, compared with the other three methods, the yield is respectively improved by 3.550, 2.877 and 4.699; e_RMSERespectively 332.322, 307.831, 450.424; RS is improved by 0.525, 0.389 and 0.69 respectively. Therefore, the prediction precision and the model performance of the method are greatly improved. Comparing the experimental results of load prediction before one hour and load prediction before the day, the prediction accuracy of the method is reduced with the increase of the prediction period, but the method is reduced less compared with other three methods; illustrating that the method of the present invention further exploits the temporal characteristics of the load sequence.

And (3) interference resistance analysis:

in practice, noise or data loss is inevitably generated in data acquisition, and in order to test the anti-noise performance of the method, the robustness of the model is tested through a disturbance analysis experiment. 1% -3% of random noise is randomly added into training data, and the evaluation indexes of the LSTM load prediction method under different proportion noises are expressed by using average absolute error percentage and prediction precision. The results are shown in fig. 10, in which the horizontal axis represents the noise belgium and the vertical axis represents the percentage of the average absolute error and the change in prediction accuracy. It can be seen that when the noise is within 2%, the change of the metric value is small, which indicates that the W _ LSTM model has certain robustness and can handle the noise problem within a certain range; after the noise exceeds 2%, the fitting capability of the model is greatly reduced.

Based on the LSTM load prediction method based on the multi-scale time characteristics, the embodiment of the invention also provides a computer storage medium. The computer storage medium has stored thereon a computer program which, when executed by a processor, implements the LSTM load prediction method as described above.

Based on the LSTM load prediction method based on the multi-scale time characteristics, the embodiment of the invention also provides an electronic device. The electronic device comprises a memory, a processor and a computer program stored in the memory and capable of running in the processor, wherein the processor executes the computer program to realize the LSTM load prediction method.

The above embodiments are only preferred embodiments and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The LSTM load prediction method based on the multi-scale time characteristics is characterized by comprising the following steps:

step 1: normalizing the original load sequence;

2. The LSTM load prediction method based on multi-scale temporal features of claim 1, wherein the wavelet is dbN wavelet.

3. The LSTM load prediction method based on multi-scale temporal features of claim 2 wherein the vanishing moment N of the dbN wavelet is 15.

4. The LSTM load prediction method based on multi-scale temporal features as claimed in claim 3, wherein said step 2 is specifically to perform 4-level wavelet decomposition and reconstruction on the normalized original load sequence to obtain 5 load sequences with different temporal characteristics.

5. The LSTM load prediction method based on multi-scale temporal features of claim 1, wherein the LSTM load prediction model is trained by an adaptive moment estimation optimization algorithm.

6. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the LSTM load prediction method of any of claims 1 to 5.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable in the processor, wherein the processor implements the LSTM load prediction method of any of claims 1 to 5 when executing the computer program.