CN115238952A

CN115238952A - Bi-LSTM-Attention short-term power load prediction method

Info

Publication number: CN115238952A
Application number: CN202210675542.3A
Authority: CN
Inventors: 冯增喜; 葛珣; 周瑶佳; 李嘉乐
Original assignee: Xian University of Architecture and Technology
Current assignee: Xian University of Architecture and Technology
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-10-25

Abstract

The invention discloses a short-term power load prediction method based on a Bi-LSTM-Attention model, which is characterized in that time sequence historical load data and weather information data are used as input, a Bi-LSTM neural network model is used for carrying out bidirectional cyclic training, the positive and negative laws of the load data are learned, an Attention mechanism is introduced on the basis of the model, and the importance degree of different characteristics to a prediction model is highlighted by distributing weights for the characteristics; meanwhile, aiming at the Bi-LSTM-Attention model, the optimized selection of the model hyper-parameters is realized through the improved whale optimization algorithm, the performance of the prediction model is further improved, and in addition, the local optimization capability of the algorithm is improved through the self-adaptive weight method. The method has higher prediction precision compared with other models.

Description

Bi-LSTM-Attention short-term power load prediction method

Technical Field

The invention belongs to the technical field of energy conservation, relates to application of a computer in an energy conservation technology, and particularly relates to a short-term power load prediction method based on a Bi-LSTM-Attention model.

Background

With the development of various energy saving technologies, accurate load prediction plays an increasingly important role in energy conservation management. In recent years, attention has been paid to load prediction techniques. Typically, load forecasts include long-term load forecasts (LTLF) for loads over one year, medium-term load forecasts (MTLF) for loads from several weeks to one year, short-term load forecasts (STLF) for loads from one day to one week, and very short-time load forecasts (VSTLF) for loads from minutes to hours ^[1] . The LTLF and the MTLF can estimate the change trend of the load, and are suitable for long-term planning of the system in the design stage. STLF and VSTLF can generate accurate control and scheduling load requirements, better suited for short-term control of existing systems.

Load prediction is a type of study on time series prediction that has begun earlier in statistics and computer science. These methods have evolved from traditional statistical methods to today's artificial intelligence based models or hybrid models.

The most common models used in time series prediction are autoregressive models, moving average models, autoregressive integrated moving average models, seasonal integrated autoregressive moving average models. These models and methods focus on univariate data with linear relationships and time dependencies, which makes it less effective for time series with non-linear characteristics. The load belongs to a time series type with non-linear characteristics, and the load prediction is influenced by various random factors including weather conditions, time information and behavior of residents ^[2～4] And the like.

In recent years, with the rapid development of deep learning, a prediction model mainly based on a Recurrent Neural Network (RNN) has attracted much attention in processing time series data. Notably, long-short memory (LSTM) networks ^[5] Is proposed to promoteThe development of RNN effectively relieves the problems of gradient explosion and gradient disappearance existing in RNN by adding a gate control unit ^[6] . LSTM can identify the structure and pattern of data in time series prediction, such as non-linearity and complexity, and thus can predict complex time series with strong non-linearity. Literature reference ^[7] The LSTM is used for energy consumption prediction, and compared with a BP neural network, the LSTM has higher prediction accuracy. Marino et al attempted to use the LSTM method ^[8] Solve the same load prediction problem and show the literature ^[9] Similar results. While LSTM has many advantages in processing complex non-linear data, it also has its limitations. LSTM is more complex and more difficult to train, and in some cases does not perform as well as the simple ARIMA model ^[10] . To improve their performance, more and more researchers have improved predictive models by combining LSTM with traditional methods or other machine learning methods. For example, cai et al ^[11] Two deep learning models (RNN and CNN) were used for multi-step load prediction with the ARIMA method and compared. The result shows that the prediction precision of the deep learning-based model is improved by 22.6 percent compared with that of an ARIMA model.

Different types of sequence data tend to have different characteristics, which can have a large impact on the choice of predictive models, the settings of the model parameters, and the accuracy of the results. In conventional studies, the study data for load prediction is generally based on weather information, time information, and historical loads ^[12～15] . The bidirectional LSTM (bidirectional LSTM) which is established in recent years is a combination of forward LSTM and backward LSTM, and data can be fitted from the forward direction and the backward direction of the sequence to achieve higher prediction accuracy ^[16] . The attention mechanism is a method for keeping important information in different input characteristics in model training through weight distribution, improves the characteristic extraction capability of data, and can effectively improve the accuracy of power daily load prediction ^[17] 。

The following are relevant references that applicants have searched for and that these references are to be used in the present invention.

【1】Singh P，Dwivedi P.Integration of new evolutionary approach with artificial neural network for solving short term load forecast problem[J].Applied energy，2018，217:537-549。

【2】Khatoon S,Singh A K.Effects of various factors on electric load forecasting:An overview[C]//Proc of the 6th IEEE Power India International Conference(PIICON).Piscataway,NJ:IEEE Press，2014：1-5。

【3】Walter T,Price P N,Sohn M D.Uncertainty estimation improves energy measurement and verification procedures[J].Applied Energy,2014，130:230-236。

【4】Yan D，O’Brien W,Hong T,et al.Occupant behavior modeling for building performance simulation:Current state and future challenges[J].Energy and buildings,2015，107:264-278。

【5】Hochreiter S，Schmidhuber J.Long short-term memory[J].Neural computation，1997，9(8):1735-1780。

【6】Vermaak J，Botha E C.Recurrent neural networks for short-term load forecasting[J].IEEE Trans on Power Systems，1998，13(1):126-132。

【7】 Zhang Ting Fei, luo Heng, liu Hang building energy consumption prediction method based on LSTM network [ J ]. Proceedings of Suzhou university of science and technology: nature's edition, 2020,37 (04): 78-84.

【8】Marino D L，Amarasinghe K，Manic M.Building energy load forecasting using deep neural networks[C]//Proc of the 42nd Annual Conference of the IEEE Industrial Electronics Society.Piscataway，NJ:IEEE Press，2016:7046-7051。

【9】Mocanu E，Nguyen P H，Gibescu M,et al.Deep learning for estimating building energy consumption[J].Sustainable Energy，Grids and Networks,2016,6:91-99。

【10】Makridakis S，Spiliotis E，Assimakopoulos V.Statistical and Machine Learning forecasting methods:Concerns and ways forward[J].PloS one，2018，13(3):e0194889。

【11】Cai M，Pipattanasomporn M,Rahman S.Day-ahead building-level load forecasts using deep learning vs.traditional time-series techniques[J].Applied energy，2019，236:1078-1088。

【12】Zhang J，Wei Y M，Li D，et al.Short term electricity load forecasting using a hybrid model[J].Energy，2018,158:774-781。

【13】Jain R K，Smith K M，Culligan P J，et al.Forecasting energy consumption of multi-family residential buildings using support vector regression:Investigating the impact of temporal and spatial monitoring granularity on performance accuracy[J].Applied Energy，2014，123:168-178。

【14】Amber K P，Aslam M W,Hussain S K.Electricity consumption forecasting models for administration buildings of the UK higher education sector[J].Energy and Buildings，2015，90:127-136。

【15】Grolinger K，L’Heureux A,Capretz M A M，et al.Energy forecasting for event venues:Big data and prediction accuracy[J].Energy and buildings，2016，112:222-233。

【16】Wu K，Wu J，Feng L，et al.An attention-based CNN-LSTM-BiLSTM model for short-term electric load forecasting in integrated energy system[J].International Transactions on Electrical Energy Systems，2021，31(1):e12637。

【17】 Zhao Bing, wang Zengping, ji Weijia, et al, CNN-GRU short term power load prediction method based on attention mechanism [ J ] grid technology, 2019, 43 (12): 4370-4376.

【18】Graves A，Jaitly N，Mohamed A.Hybrid speech recognition with deep bidirectional LSTM[C]//Proc of IEEE workshop on automatic speech recognition and understanding.Piscataway，NJ:IEEE Press，2013:273-278。

【19】Graves A，Schmidhuber J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural networks,2005，18(5-6):602-610。

【20】Wang Y，Huang M，Zhu X,et al.Attention-based LSTM for aspect-level sentiment classification[C]//Proc of EMNLP.Stroudsburg:ACL Press，2016:606-615。

【21】Mirjalili S，Lewis A.The whale optimization algorithm[J].Advances in engineering software，2016，95:51-67。

【22】Schuster M，Paliwal K.K.Bidirectional recurrent neural networks[J].IEEE Trans on Signal Processing，1997，45(11):2673-2681。

Disclosure of Invention

The invention aims to provide a short-term power load prediction method based on a Bi-LSTM-Attention model, aiming at the problems that power loads have high volatility and uncertainty and the traditional load prediction method has limitations when nonlinear time series data are processed.

In order to realize the task, the invention adopts the following technical solution:

a short-term power load prediction method based on a Bi-LSTM-Attention model is characterized in that time sequence historical load data and weather information data are used as input, bidirectional circulation training is carried out by using the Bi-LSTM neural network model, the positive and reverse laws of the load data are learned, an Attention mechanism is introduced on the basis of the model, and importance degrees of different features to the prediction model are highlighted by distributing weights for the features; meanwhile, aiming at the Bi-LSTM-Attention model, optimized selection of model hyper-parameters is achieved through an improved whale optimization algorithm, the performance of the prediction model is further improved, and in addition, the local optimization capability of the algorithm is improved through a self-adaptive weight method.

According to the invention, the Bi-LSTM neural network model comprises an input layer, an embedding layer, a forward LSTM hidden layer, a reverse LSTM hidden layer, an attention mechanism layer, a full connection layer and an output layer; after the Bi-LSTM neural network model receives input information, time sequence data are transmitted into hidden layers of forward LSTM and backward LSTM, and the hidden layers are combined to output processed vectors. And the attention mechanism layer takes the data processed by the bidirectional LSTM as input, calculates the attention weight of the data, then uses normalization processing, and finally combines the weight vector with the corresponding characteristic at the current moment to obtain the output of characteristic attention.

Compared with other models, the short-term power load prediction method based on the Bi-LSTM-Attention model has higher prediction precision, and brings technical innovation that:

1) Before the effect of the model is verified, periodic analysis and bidirectional information flow verification are carried out on the load data, and the conclusion that the LSTM is reasonable to use and the current time data are influenced by past and future data is obtained. And then, standardizing the data, and establishing an evaluation index of the evaluation model.

2) After the Bi-LSTM model is constructed and the Attention is introduced, the experimental result verifies that the bidirectional network and the Attention mechanism have positive influence on the accuracy of power load prediction.

3) In the WOAWC-Bi-LSTM-Attention model, aiming at the problem of difficulty in selecting the super-parameters of the network, a group of super-parameters is found by using an improved whale optimization algorithm, so that the mean square error of the Bi-LSTM-Attention model is minimum. The experimental result shows that the evaluation indexes of the optimized WOAWC-Bi-LSTM-orientation model are reduced compared with those of the prior model, and the determination coefficient is closest to 1.

Drawings

FIG. 1 is a diagram of an LSTM network architecture;

FIG. 2 is a diagram of a Bi-LSTM neural network architecture;

FIG. 3 is a schematic view of the attention mechanism;

FIG. 4 is a diagram of the structure of the Bi-LSTM-Attention model;

FIG. 5 is a flow chart of the WOAWC optimized Bi-LSTM-orientation;

FIG. 6 is a graph of the training process for each model, wherein (a) the graph is the LSTM loss function; (b) the graph is a BilSTM loss function; (c) the graph is a BilSTM-AT loss function; (d) a WOAWC-Bi-LSTM-AT loss function;

FIG. 7 is a one week load trend graph;

FIG. 8 is the autocorrelation coefficients of the forward and reverse sequences;

FIG. 9 is a fitness graph;

FIG. 10 is a different hyper-parameter optimization process;

fig. 11 is a comparison of prediction results.

FIG. 12 is a comparison of the results of the WOA and WOAWC optimization models for the native algorithm.

The present invention will be described in further detail with reference to the following drawings and examples.

Detailed Description

The embodiment provides a short-term power load prediction method based on a Bi-LSTM-Attention model, which mainly takes historical data of loads as input and considers the influence of outdoor temperature, relative humidity and time information. The BilSTM neural network learns the change rule of time sequence data, and combines the attention mechanism to highlight the influence of key features and distributes attention weight to carry out deep mining on the rule of load data. Meanwhile, aiming at the Bi-LSTM-Attention model, the improved whale optimization algorithm is used for optimizing the super-parameters of the model, so that the prediction performance is further improved. The experimental result shows that compared with an LSTM model, a Bi-LSTM model and a Bi-LSTM-Attention model, the model has higher prediction precision, and the error indexes MAPE, RMSE and MAE are all obviously reduced.

The specific implementation is as follows.

1. BilsTM-Attention prediction model

1.1 LSTM neural network

LSTM is a highly efficient RNN structure proposed by Hocherier and Schmidhuber in 1997 ^[18] . As in fig. 1, the top row of lines is a status cell, which refers to an internal memory. The lines across the bottom are the hidden layer states, and the gating cells f, i, o, and g are designed to solve the gradient vanishing problem. In network training, each gate learns the weights and biases separately. Where a forgetting gate helps the LSTM decide which information to discard from the state cell, the amount that can be adjusted through the previous hidden layer state. The input gates determine how much new information to store in the state cells and the output gates adjust the amount of hidden layer states in the next sequence. The LSTM network corresponding parameters are calculated as follows:

f _t ＝σ(W _fx x _t +W _fh h _t-1 +b _f ) (1)

i _t ＝σ(W _ix x _t +W _ih h _t-1 +b _i ) (2)

g _t ＝σ(W _gx x _t +W _gh h _t-1 +b _g ) (3)

o _t ＝σ(W _ox x _t +W _oh h _t-1 +b _o ) (4)

c _t ＝g _t ⊙i _t +c _t-1 ⊙f _t (5)

h _t ＝φ(c _t )⊙o _t (6)

in the formula (f) _t ，i _t ，o _t ，c _t The states of the forgetting gate, the input gate, the output gate and the state unit at the current time t are respectively; x is the number of _t Inputting at the current time t; h is _t-1 The previous time is the hidden layer state; g _t For internal hidden layer states, based on x _t And h _t-1 Calculating to obtain; w _fx ，W _fh ，W _ix ，W _ih ，W _gx ，W _gh ，W _ox ，W _oh And b _f ，b _i ，b _g ，b _o Respectively corresponding weight matrix and bias item; σ (·), φ (·) represents the Sigmoid and tanh activation functions, respectively; as indicates a hadamard product.

1.2Bi-LSTM neural network

There is only one counterpropagating LSTM in the LSTM, which makes it possible to fit time-dependent data from only one direction when processing the data. Graves ^[19] A bidirectional LSTM is provided on the basis of the LSTM. Unlike unidirectional LSTM, the Bi-LSTM neural network adds a layer of inverse LSTM. The reverse LSTM performs reverse processing on the time sequence data, the hidden layer fuses forward information and reverse information, so that the network can effectively learn more time sequence data information, and the Bi-LSTM neural network structure is shown in figure 2.

The backward LSTM is computed in a similar manner to the forward LSTM, and information for subsequent time series is obtained only in the reverse direction. The calculation formula of the Bi-LSTM network is as follows:

h _f ＝f(W _f1 x _t +W _f2 h _t-1 ) (7)

h _b ＝f(W _b1 x _t +W _b2 h _t+1 ) (8)

wherein h is _f Is the output of the forward LSTM network, h _b Is the output of the inverse LSTM network, the final output of the hidden layer is:

y _i ＝g(W _o1 ⊙h _f +W _o2 ⊙h _b ) (9)

1.3 Mechanism of Attention

The Attention mechanism is a probabilistic weighting mechanism that mimics human brain Attention ^[20] When the human brain observes things, it focuses on a specific place and ignores others, and the attention mechanism highlights more important features by assigning different probability weights to the inputs, thereby improving the accuracy of the model. Therefore, the BilSTM neural network combined with the Attention mechanism predicts the load, and can avoid the influence of complex features in the data, and the structure of the load is shown in FIG. 3.

In the figure, the value of the input sequence is x ₁ To x _n The value of the hidden layer state is h ₁ To h _n And alpha represents the attention weight of the hidden layer to the current input, and the calculation formula is as follows:

e _t ＝u tanh(wh _t +b) (11)

in the formula: e.g. of the type _t Output h from LSTM layer for time t _t The determined attention probability assignment; u and w are weight coefficients; b is a bias term; c. C _t Is the output of the Attention layer at time t.

1.4 Bi-LSTM-Attention model

The Bi-LSTM-Attention model comprises an input layer, an embedding layer, a forward LSTM hidden layer, a reverse LSTM hidden layer, an Attention mechanism layer, a full connection layer and an output layer, and the structure of the Bi-LSTM-Attention model is shown in FIG. 4.

After the Bi-LSTM-Attention model receives input information, time sequence data are transmitted into hidden layers of forward LSTM and backward LSTM, and the hidden layers are combined to output processed vectors. The attention mechanism layer takes the bi-directional LSTM processed data as input, calculates its attention weight, and then uses a normalization process. And finally, combining the weight vector with the corresponding feature at the current moment to obtain the output of feature attention.

1.5 WOAWC optimized Bi-LSTM-Attention model

Whale Optimization Algorithm (WOA) is a novel group intelligent Optimization Algorithm proposed by Australian scholars Mirialili et al in 2016 ^[21] The method is a meta-heuristic algorithm for simulating the predation behavior of the whale population in nature. The method has the characteristics of simple principle, few set parameters, strong global search capability and the like, and has proved to be superior to the PSO algorithm in the aspects of processing the optimization of continuous functions on the aspects of solving precision and convergence speed, but the problems that the WOA algorithm is easy to fall into local optimum and the convergence precision is low still exist. In the embodiment, the positions of the whales are varied through an improved Whale Optimization Algorithm (WOAWC), the global search capability of the algorithm is improved, and in addition, the local optimization capability of the algorithm is improved through a self-adaptive weight method.

Further, the WOAWC principle is as follows:

(1) When the whale optimization algorithm is used for global search of the population, one whale needs to be randomly selected as a reference, so that other whales are randomly selected to be close to the reference whale. The selection of the reference whale in the original WOA algorithm is random, and the optimization of the algorithm on the global optimal solution is influenced. In this embodiment, the whale is mutated by using the cauchy inverse cumulative distribution function, so that individual whales are mutated to a wider range.

The Cauchi inverse cumulative distribution function formula is as follows:

when the whale carries out Cauchi adverse cumulative distribution variation, local optimization can be carried out by adopting a spiral wandering mode, so that the blind variation of the whale is avoided, and the formula is as follows:

in the formula (I), the compound is shown in the specification,

is a vector of coefficients that is a function of,

is a vector of the position of the object,

is a whale individual position vector randomly selected from the current population.

Using the Cauchy variation to rewrite the formulas (14) and (15):

in the formula, F ^-1 Is the inverse cumulative distribution function of the Cauchy distribution, x _ij Is j position points of the ith whale before mutation, and r belongs to [0,1 ]]Is uniformly distributed.

(2) When whales are subjected to local optimization, an individual of the whales in WOA, which surrounds the catching stage and is closest to food, is equivalent to a current local optimal solution, and a formula when the other individuals approach to the optimal solution is as follows:

the invention provides a method for changing the position of an optimal whale individual at the moment and improving the local optimizing capability of the whale by adopting a self-adaptive weight method, wherein the self-adaptive weight formula is as follows:

where t is the current iteration number, t _max The maximum number of iterations is indicated. Introducing adaptive weights to equation (17):

in order to improve the performance of the model, the network structure and the optimization parameters need to be optimized, and reasonable parameters are set so that the model can be converged to the global minimum value quickly. In this embodiment, 6 parameters including a learning rate (L), a training number (N), a batch size (B), a first hidden layer node number (H1), a second hidden layer node number (H2), and a full connection layer node number (F) of the wlst-Attention network model are optimized through the WOAWC, and meanwhile, a parameter search range is set to be limited to L e [0.001,0.01], N e [10, 100], B e [16, 128], H1 e [1, 128], H2 e [1, 128], and F e [1, 100] to prevent the search space from being too large to affect the optimization efficiency, and finally, the optimized WOAWC-Bi-LSTM-Attention model is verified, and a flow chart thereof is shown in fig. 5, and includes the following steps:

firstly, acquiring data, preprocessing the data, and then dividing a data set into a training set and a test set; wherein the training set enters a WOAWC-Bi-LSTM-orientation model for training, and the testing set enters the WOAWC-Bi-LSTM-orientation model for testing;

the WOAWC encodes an initial value, performs population initialization after calculating a fitness value, then performs update by the WOAWC population, and performs global optimal solution update after calculating the fitness value; if the condition is met, outputting the optimal network parameters, and if the condition is not met, returning to the WOAWC population updating step;

after calculating the fitness value and performing a population initialization step, the input parameters enter a WOAWC-Bi-LSTM-orientation model, the WOAWC decodes the input parameters to obtain corresponding 6 super-parameter values, and the WOAWC-Bi-LSTM-orientation model returns the fitness value;

after calculating the fitness value and carrying out the global optimal solution updating step, the input parameters enter a Bi-LSTM-orientation model, and the fitness value is transmitted back by the WOAWC-Bi-LSTM-orientation model.

1.6 loss function

In the embodiment, the training process of the prediction model is optimized by using an Adam algorithm, and the loss function is a mean square error function, that is:

in the formula: n is the number of samples; y is _i And

the true value and the predicted value of the sample point. Load prediction was performed at 96 moments in conjunction with the study herein, so n =96. The loss function curve for each model training process is shown in fig. 6.

2. Example analysis

2.1 data Source and Pre-processing

In modern power systems, meteorological factors have increasingly significant influence on the load of the power system. Therefore, considering meteorological factors becomes one of the main means for the dispatch center to further improve the load prediction accuracy.

In the embodiment, the provided prediction method is verified by selecting a short-term load value all the year 2014 in a certain region provided by a public data set on a website, wherein the short-term load value comprises time information, weather information and a load value. One day of the data is divided into 96 time points (sampled once at 15 min), and modeling is carried out by constructing a rolling sequence, namely all values from day 1 to day n are input, 96 load values from day n +1 are output, all values from day 2 to day n +1 are input, 96 load values from day n +2 are output, and the like, so that the multi-input multi-output load prediction is constructed.

Different evaluation indexes generally have different dimensions and units, and the situation can influence the data analysis result. Therefore, to eliminate this difference, before training and verifying the data, the indexes are in the same order, and it is necessary to normalize the data and process the data by Min-Max method.

In the formula: x is the raw data; x is normalized data; x is a radical of a fluorine atom _min 、x _max The minimum and maximum values of the data, respectively. Mapping the normalized data to [ -1,1]An interval.

2.2 data validation

For the rigor of the experiment, the prediction method is verified by combining the experimental data of the power load. Fig. 7 shows the load trend during the week, with one day of the data divided into 96 time points. As can be seen from the figure, the load data fluctuates according to a certain frequency, the whole system has periodicity, and the LSTM method is reasonable to select.

Compared with the traditional LSTM neural network, the Bi-LSTM neural network considers the internal rules of forward and backward data at the same time, and develops prediction from history and future directions ^[22] . Therefore, the load prediction is performed while considering the influence of the historical load and the future load on the prediction accuracy.

In order to verify that the load data has bidirectional information flow, load data of a month in a data set is selected, the load data is divided into a forward load sequence and a reverse load sequence along the center of the month data in a forward direction and a reverse direction, autocorrelation coefficients of the two sequences are respectively calculated, and as can be seen from fig. 8, a time sequence of the load has obvious forward and reverse laws.

2.3 evaluation index

To evaluate the performance of the prediction model, the error indicators used in this embodiment are: mean percent error MAPE, mean absolute error MAE, root mean square error RMSE, coefficient of determination R ² The expressions are respectively as follows:

in the formula: n is the total number; y is _i And

the real value and the predicted value of the ith sample point are respectively.

2.4 prediction results and comparative analysis

The experimental computer is configured as a Windows 10-bit operating system, and the GPU is NVIDIA GeForce RTX 2070s 8G. The programming software used was python3.7 and the compilation environments were tensoflow2.6.0 and keras2.2.4.

The model training set is the input data at 364 days before 2014, and the test set is the load data at the last two days. As shown in FIG. 9, the adaptation curve of the model WOAWC-Bi-LSTM-Attention is finally stabilized to 0.0027 by improving the iterative optimization of the Whale Optimization Algorithm (WOAWC). The 6-hyperparameter iteration is shown in FIG. 10, and the final stable values are shown in Table 1. In the comparative experiment of this embodiment, the BPNN model, the LSTM model, the BiLSTM model and the BiLSTM-Attention model are selected for load prediction, and the prediction results are shown in fig. 11. Fig. 12 shows a comparison graph of the results of the WOA optimized network model and the WOAWC optimized network model of the original algorithm.

The result of FIG. 11 shows that the load value after the WOAWC is optimized to the hyperparameter of the BilSTM-Attention network is best fitted, and the prediction result is closer to the true value. The evaluation index of the predictive performance of each model is shown in Table 2.

Table 1: result of parameter optimization

Parameter(s)	Best results
		Learning rate	0.00552
Number of training sessions	98
		Batch size	40
Number of nodes of first hidden layer	100
		Number of nodes of second hidden layer	74
Number of full connection layers	61

Table 2: comparison of prediction errors of different models

As is evident from Table 2, the LSTM-based prediction model performed better than the BP model in time series data prediction. The two-way LSTM model is superior to the one-way LSTM model, indicating that BiLSTM can better find features in the sequence. The errors of MAPE, RMSE and MAE of the Bi-LSTM model added with the Attention are respectively reduced by 2.73%, 5.7% and 12.42% compared with the errors of the Bi-LSTM model, and the Attention improves the prediction effect on the excavation of different feature contribution degrees. After 6 kinds of hyper-parameters of the network model are optimized by using the improved whale optimization algorithm, the prediction performance of the model is further improved, and R ² The value reaches above 0.99.

3. Conclusion

Aiming at the requirement that the short-term prediction precision of the power load is gradually improved, the embodiment provides a short-term power load prediction model based on WOAWC optimized Bi-LSTM-Attention, and through experimental verification, the relevant conclusions are as follows:

1) Before the effect of the model is verified, periodic analysis and bidirectional information flow verification are firstly carried out on the load data, and the conclusion that the LSTM is used reasonably and the current time data is influenced by the past data and the future data is obtained. And then, standardizing the data, and making an evaluation index of the evaluation model.

3) In the WOAWC-Bi-LSTM-Attention model, aiming at the problem of difficulty in selecting the super-parameters of the network, a group of super-parameters is found by using an improved whale optimization algorithm, so that the mean square error of the Bi-LSTM-Attention model is minimum. The experimental result shows that the evaluation indexes of the optimized model are reduced compared with those of the prior model, and the decision coefficient is closest to 1.

In future research, influences of more complex input characteristics such as date types and load characteristics on the power load can be considered, different intelligent algorithms are researched, and different methods are improved on the intelligent algorithms to analyze and compare the performance of the model, so that the short-term power load prediction accuracy and universality are further improved.

Claims

1. A short-term power load prediction method based on a Bi-LSTM-Attention model is characterized in that time sequence historical load data and weather information data are used as input, bidirectional circulation training is carried out by using the Bi-LSTM neural network model, the positive and reverse laws of the load data are learned, an Attention mechanism is introduced on the basis of the model, and importance degrees of different features to the prediction model are highlighted by distributing weights for the features; meanwhile, aiming at the Bi-LSTM-Attention model, the optimized selection of the model hyper-parameters is realized through the improved whale optimization algorithm, the performance of the prediction model is further improved, and in addition, the local optimization capability of the algorithm is improved through the self-adaptive weight method.

2. The method of claim 1, in which the Bi-LSTM neural network model comprises an input layer, an embedding layer, a forward LSTM hidden layer, a backward LSTM hidden layer, an attention mechanism layer, a fully-connected layer, and an output layer; after the Bi-LSTM neural network model receives input information, time sequence data are transmitted into hidden layers of forward LSTM and reverse LSTM, the processed vectors are output by combining the hidden layers, an attention mechanism layer takes the data processed by the bidirectional LSTM as input, attention weight of the data is calculated, then normalization processing is used, and finally the weight vectors and corresponding features at the current moment are combined to obtain output of feature attention.