WO2022241932A1

WO2022241932A1 - Prediction method based on non-intrusive attention preprocessing process and bilstm model

Info

Publication number: WO2022241932A1
Application number: PCT/CN2021/105889
Authority: WO
Inventors: 史清江; 李丹丹; 曾歆
Original assignee: 同济大学
Priority date: 2021-05-21
Filing date: 2021-07-13
Publication date: 2022-11-24
Also published as: CN113177666A

Abstract

A prediction method based on a non-intrusive attention preprocessing process and a BiLSTM model. A deep learning model enhanced by a non-intrusive attention mechanism is used for long-term energy consumption prediction, consists of an attention mechanism-based preprocessing model and a universal BiLSTM network, and is called as AP-BiLSTM. The attention mechanism-based preprocessing model is completed by the dot product of a convolutional layer and a fully connected layer. The two layers perform feature mapping of the original input data, which is critical to improve the performance of an AP-BiLSTM method. By means of the manner, both local and global associations in the long-term dependency of the input data are enhanced. The method comprises the following steps: S1: performing a non-intrusive data preprocessing process; and S2: inputting the result in S1 into the BiLSTM network model to obtain a final prediction result.

Description

A prediction method based on non-intrusive attention preprocessing and BiLSTM model

technical field

The invention relates to the technical field of power load forecasting, specifically based on the electricity consumption data of the existing time period, using a prediction method combining non-intrusive attention preprocessing and BiLSTM model to predict the electricity consumption data of the future time period.

Background technique

Power load forecasting is to predict its future value based on the past and present of the power load. Forecasting the power load can infer the development trend and possible state of the load, and improve economic and social benefits.

The current power load forecasting methods are generally divided into traditional forecasting methods and modern forecasting methods, among which modern forecasting methods mainly include the following: forecasting methods based on convolutional neural network models, using LSTM (long-short-term memory long-short-term memory Network model) model combined with time series to predict power system load, and neural network methods such as direct convolution using multi-dimensional data such as power consumption data, temperature, time, etc., have achieved good results.

Among them, BiLSTM (Bi-directional long-short-term memory bidirectional long-short-term memory network model) is an artificial cyclic neural network structure, which is very suitable for prediction based on time series data. The BiLSTM model solves the vanishing gradient problem and provides long-term correlation. However, while BiLSTM performs well in time series forecasting tasks, fundamental constraints of sequence computation still exist. This problem can be solved by attention mechanism, which enables the model to achieve better results in the input or output sequence, regardless of the length of the input data. However, in the current model that combines attention and BiLSTM and other recurrent neural networks, it is usually necessary to modify the internal structure of BiLSTM, which increases the difficulty of model design. Based on this, the present invention uses the attention mechanism as a preprocessing process, combined with the BiLSTM model, which not only enhances the long-term memory ability of the model, but also avoids internal modification of the BiLSTM model.

Contents of the invention

The invention discloses a prediction method based on a non-intrusive attention preprocessing process and a BiLSTM model. A deep learning model enhanced by a non-intrusive attention mechanism is used for long-term energy consumption prediction. A preprocessing model based on an attention mechanism and A general BiLSTM network is composed, called AP-BiLSTM. The preprocessing model based on the attention mechanism is completed by the dot product of the convolutional layer and the fully connected layer. These two layers perform feature mapping of the original input data, which is the key to improving the performance of the AP-BiLSTM method. In this way, both local and global associations in the long-term dependencies of the input data are enhanced.

Technical solutions

A new forecasting method based on non-intrusive attention preprocessing process and BiLSTM model for power load forecasting, including the following steps:

S1: Non-intrusive data preprocessing process;

S2: Input the results in S1 into the BiLSTM network model to get the final prediction result.

In S1: Data preprocessing based on the non-intrusive attention mechanism. The output obtained after the training data passes through the processing module has learned the relationship between the data before and after the time series, but it can still be used as a new input and input into the BiLSTM network. , specifically include the following steps:

S1.1: Express the original input data as: x ₁ , x ₂ ,…,x _m , where 1, 2,…, m represent the duration of the input time series, then the total length is m, and the available shape of the data is ( m,1) matrix representation. The data is preprocessed according to the time series sliding window sampling, the window length is recorded as window_size, and the window data with the shape of (window_size, 1) is sequentially intercepted as samples to construct a sample data set, then each sample after preprocessing can be expressed as : x ₁ , x ₂ ,…, x _{window_size} , then the total sample data can be represented by a matrix with shape: (m, window_size, 1);

S1.2: Divide the data into 20%, 10%, and 70% ratios: test set, verification set, and training set;

S1.3: Perform one-dimensional convolution operation on each input training data in S1.2, the convolution kernel size is k, and the convolution process is expressed as:

Where cx _t represents the convolution result, x _t represents the input data at time t, x _t+n represents the n-order neighbor of x _t , where n<k, that is, the k-order neighbor, w _c and b _c represent the parameters to be learned;

S1.4: Perform fully connected calculations on the input training data in S1.2. The calculation process can be expressed as: dx _t = ω _d x _t + b _d , where w _d and b _d represent the parameters to be learned, and dx _t represents The output result of this step;

S1.5: Do a weighted summation of the results in S1.3 and S1.4, the calculation process can be expressed as:

Among them, cx _t is the convolution result in S1.3, dx _t is the calculation result of the fully connected layer in S1.4, and [ ] means dot product;

S1.6: The output result in S1.5 (ie

), connect with the original input sequence: x ₁ , x ₂ ,…, x _{window_size} , the process can be expressed as:

in

Indicates the calculation result in S1.5, x _t indicates the original input, and the data after connection processing can be represented by a matrix of shape (window_size, 2), where window_size is the sliding window size of the input data in S1.1;

In S2: the BiLSTM network model described has an Encoder-Decoder architecture, the Encoder architecture is composed of a BiLSTM network layer, and the Decoder architecture is composed of an LSTM network layer and a Dense network layer, specifically including the following steps:

S2.1: Input the results in S1.6 into the BiLSTM network layer. The forward propagation layer of BiLSTM has 15 neurons, and the back propagation layer has 15 neurons. The number of neurons can be represented by units, and the output data It can be represented by a matrix of shape (m, window_size, 30), where m represents the number of samples;

S2.2: Express the output in S2.1 as (y′ _t1 , y′ _t2 ,…,y′ _units ), and input it into the LSTM network layer;

S2.3: Express the output in S2.2 as (y″ _t1 ,y″ _t2 ,…,y″ _{window_size-1} ), and input it into the Dense network layer, and output the final prediction result output, that is, the shape is ( m,1) matrix;

S2.4: Through the network model described in S1 and S2.1-S2.3, the output prediction sequence output can be expanded and expressed as (y ₁ , y ₂ ,...y _i ), where y _i , i=1 ...m, respectively represent the prediction results of the data of the i-th sample, and then compare y _i with the real value, and update the network parameters through backpropagation, where the loss function is the mean square error (MSE) loss function, and the learning rate is set to 0.01. And use the performance of the model on the verification set to achieve early stopping, repeat the above steps, and continuously adjust the model parameters to obtain better accuracy.

S2.5: Use steps S1 and S2.1-S2.4 to obtain the optimized final network model, test the performance of the model through the test set, and finally apply it in the actual prediction work.

Beneficial effects of the present invention: the method of the present invention adopts a new non-intrusive attention preprocessing process, and by extracting the attention operation from the internal model into the preprocessing process, both the local and global associations in the long-term dependencies of the input data are obtained enhanced. At the same time, this preprocessing process can be applied to most deep learning networks and avoids modification of their network structures. Therefore, the method of the present invention still performs better in long-term forecasting.

Description of drawings

Fig. 1 is a flow chart of the prediction method of the present invention.

Fig. 2 is an overall model architecture diagram of the present invention.

Fig. 3 is a structural diagram of the present invention based on non-intrusive attention preprocessing module. (innovation part)

Fig. 4 is a data visualization diagram used in the examples of the present invention.

Fig. 5 is an indicator diagram of the prediction results of the method of the present invention and other 4 comparison methods.

Detailed ways

In order to make the purpose and effect of the present invention clearer, take a prediction method based on the non-intrusive attention preprocessing process and the BiLSTM model in the present invention as an example, using the national data from July 1, 2015 to January 2020 A total of 1667 pieces of daily total electricity load data on the 23rd describe the integrated model of the present invention in detail.

S1: Express the original input data as: x ₁ ,x ₂ ,…,x _m , where 1,2,…,m represent the duration of the input time series, the total length is m, and the data shape is (m,1). The data is preprocessed according to the time series sliding window sampling, the window is recorded as window_size, and the window data with the shape of (window_size, 1) is sequentially intercepted as a sample to construct a sample data set, then the preprocessed sample data is: (m, window_size, 1), in the present invention, the input time series length is 1667, and window_size is set to 50, then the sample data shape after preprocessing is (1667,50,1);

S2: Divide the data into test set, validation set, and training set according to the ratio of 20%, 10%, and 70%. The test set, validation set, and training set sample numbers are respectively: 333, 167, and 1167, and the shapes are respectively (333,50,1),(167,50,1),(1167,50,1);

S3: Perform one-dimensional convolution on the input training data in S2, the convolution kernel is k, and the process is expressed as:

Among them, x _t represents the input data at time t, x _t+n represents the n-order neighbor of x _t , where n<k is the k-order neighbor, and w _c and b _c represent the parameters to be learned. In the present invention, the convolution kernel is set to 3;

S4: Perform full connection calculation on the input training data in S2, the calculation process can be expressed as: dx _t = ω _d x _t + b _d , where w _d and b _d represent the parameters to be learned;

S5: Perform weighted summation of the results in S2 and S4, the calculation process can be expressed as:

Where cx _t is the convolution result in S3, and dx _t is the calculation result of the fully connected layer in S4;

S6: Connect the output results in S5 with the original input data. Each data sample can be expressed as: x ₁ , x ₂ ,…,x ₅₀ , and the process can be expressed as:

in

Indicates the calculation result in S5, x _t indicates the original input, and the data shape after connection processing is (50,2);

S7: Input the result in S5 into the BiLSTM network layer, the forward propagation layer of BiLSTM has 15 neurons, and the back propagation layer has 15 neurons, then the shape of the output data is (1167,50,30);

S8: Input the output in S7 into the LSTM network layer, set 10 neurons, then the output shape is (1167,10);

S9: Input the data in S8 into the Dense network layer, and output the final prediction result, whose output shape is (1167,1);

S10: Through the network model described in S1-S9, output the prediction sequence (y ₁ , y ₂ ,...y ₁₁₆₇ ), where y _i represents the prediction result of the i-th training sample, and then compare y _i with the real value, Update network parameters through backpropagation. At the same time, use the verification set for verification. When the accuracy rate of the verification set does not decrease for 10 consecutive rounds, stop the training, repeat the above steps, and continuously adjust the model parameters to obtain a better accuracy rate.

S11: Use the final network model obtained in steps S1-S10, use the test data set to test its final performance, and apply it in the actual prediction work.

Claims

A new forecasting method based on non-intrusive attention preprocessing process and BiLSTM model for power load forecasting, characterized in that it includes the following steps:

S1: Non-intrusive data preprocessing process;

S2: Input the results in S1 into the BiLSTM network model to get the final prediction result.
A new forecasting method based on non-intrusive attention preprocessing process and BiLSTM model for power load forecasting as described in claim 1, characterized in that, in S1: data pre-processing based on non-intrusive attention mechanism The processing process specifically includes the following steps:

S1.1: Express the original input data as: x 1 , x 2 ,…,x m , where 1, 2,…, m represent the duration of the input time series, then the total length is m, and the available shape of the data is ( The matrix representation of m, 1); the data is preprocessed according to the time series sliding window sampling, the window length is recorded as window_size, and the window data with the shape of (window_size, 1) is sequentially intercepted as samples, and the sample data set is constructed, then the preprocessing After each sample can be expressed as: x 1 , x 2 ,…,x window_size , then the total sample data can be expressed as a matrix with shape: (m, window_size, 1);

S1.2: Divide the data into: test set, verification set, training set;

S1.3: Perform one-dimensional convolution operation on each input training data in S1.2, the convolution kernel size is k, and the convolution process is expressed as:
Where cx t represents the convolution result, x t represents the input data at time t, x t+n represents the n-order neighbor of x t , where n<k, that is, the k-order neighbor, w c and b c represent the parameters to be learned;

S1.4: Perform fully connected calculations on the input training data in S1.2. The calculation process can be expressed as: dx t = ω d x t + b d , where w d and b d represent the parameters to be learned, and dx t represents The output result of this step;

S1.5: Do a weighted summation of the results in S1.3 and S1.4, the calculation process can be expressed as:

Among them, cx t is the convolution result in S1.3, dx t is the calculation result of the fully connected layer in S1.4, and [ ] means dot product;

S1.6: The output result in S1.5 (ie
), connect with the original input sequence: x 1 , x 2 ,…, x window_size , the process can be expressed as:
in
Indicates the calculation result in S1.5, x t indicates the original input, and the data after connection processing can be represented by a matrix with shape (window_size, 2), where window_size is the sliding window size of the input data in S1.1.
A new forecasting method based on a non-intrusive attention preprocessing process and a BiLSTM model for power load forecasting as described in claim 1, wherein in S2: the BiLSTM network model is, with an Encoder - Decoder architecture, Encoder architecture is composed of BiLSTM network layer, Decoder architecture is composed of LSTM network layer and Dense network layer, including the following steps:

S2.1: Input the results in S1.6 into the BiLSTM network layer. The forward propagation layer of BiLSTM has 15 neurons, and the back propagation layer has 15 neurons. The number of neurons can be represented by units, and the output data It can be represented by a matrix of shape (m, window_size, 30), where m represents the number of samples;

S2.2: Express the output in S2.1 as (y′ t1 , y′ t2 ,…,y′ units ), and input it into the LSTM network layer;

S2.3: Express the output in S2.2 as (y″ t1 ,y″ t2 ,…,y″ window_size-1 ), and input it into the Dense network layer, and output the final prediction result output, that is, the shape is ( m,1) matrix;

S2.4: Through the network model described in S1 and S2.1-S2.3, the output prediction sequence output can be expanded and expressed as (y 1 , y 2 ,...y i ), where y i , i=1 ...m, respectively represent the prediction results of the data of the i-th sample, and then compare y i with the real value, and update the network parameters through backpropagation, where the loss function is the mean square error (MSE) loss function, and the learning rate is set to 0.01; and use the performance of the model on the verification set to achieve early stopping, repeat the above steps, and continuously adjust the model parameters to obtain a better accuracy rate;

S2.5: Use steps S1 and S2.1-S2.4 to obtain the optimized final network model, test the performance of the model through the test set, and finally apply it in the actual prediction work.