CN112529283A

CN112529283A - Comprehensive energy system short-term load prediction method based on attention mechanism

Info

Publication number: CN112529283A
Application number: CN202011410725.XA
Authority: CN
Inventors: 严俊; 申刚; 张艳来; 陈洪柱; 李玉松; 李文龙; 陶永晋; 刘丛浩; 朱紫伟; 丁铭真
Original assignee: Tiandaqiushi Electric Power High Technology Co ltd
Current assignee: Tiandaqiushi Electric Power High Technology Co ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-03-19

Abstract

The invention discloses a comprehensive energy system short-term load forecasting method based on an attention mechanism, which relates to the field of energy load forecasting and comprises the following steps of 1: preprocessing input data; step 2: establishing a CNN-LSTM-BILSTM model based on an attention system: and step 3: and training the model and predicting the short-term load of the comprehensive energy system according to the training model. According to the technical scheme, the influence of temperature, cold load, electric load data and natural gas load on the electric load of the comprehensive energy system is considered, useful information in historical data can be fully mined, technical support is provided for operation optimization of comprehensive energy, and the method has great practical value.

Description

Comprehensive energy system short-term load prediction method based on attention mechanism

Technical Field

The invention relates to the field of energy load prediction, in particular to a comprehensive energy system short-term load prediction method based on an attention mechanism.

Background

Energy is a necessity for human life and development. With the development of industry, how to solve the shortage of non-renewable energy sources and environmental pollution has become a challenge of power systems. The use of Integrated Energy Systems (IES) in combination with various energy systems (e.g., cold/heat/electricity/gas systems) allows for the total consumption of renewable energy and the cascading use of non-renewable energy. Therefore, IES is a trend of future development of power systems. However, the complementarity of multiple energy sources also causes great difficulty in power load prediction. Conventional power system load forecasting typically takes into account weather, economics, electricity prices, and holidays. For the IES, the influence of cold (heat) source, natural gas, solar energy, etc. on the system load needs to be considered. Furthermore, due to the time series and non-linearity of the load, and the power system is greatly affected by the demand side, it is difficult to predict the long-term load of the IES. The short-term load forecast refers to load of each hour or each day after load forecast according to historical data of previous days, and provides theoretical basis for the IES to make a power generation plan. The idea of the IES has just emerged, and how to accurately predict the power load of the IES becomes a problem to be solved.

The main prediction methods for conventional power system load prediction include autoregressive differential moving average (ARIMA) and regression analysis. With the rise of big data and artificial intelligence technology of power systems, machine learning and deep learning are widely used for short-term load prediction of power systems, such as support vector machine regression (SVR), Artificial Neural Network (ANN), and the like, and the accuracy of model prediction can be remarkably improved by using the artificial intelligence technology. Among them, SVR can be regarded as convex optimization, and can always find a global optimal solution, which performs well in a smaller data set, but is easily over-fitted when the data set becomes larger. In addition, the slack variables and kernel parameters of the SVR need to be set manually. The ANN has strong data fitting ability, but it is difficult to find a global optimal solution. In most cases, it will fall into a local optimum and conventional artificial neural networks do not work well in processing time-dependent data. A Recurrent Neural Network (RNN) is an improved ANN that retains information about delivery time in the network. RNNs address the problem of conventional ANNs handling time-related data. But during the RNN training process, gradient explosions and gradient disappearing are likely to occur. To address this problem, scholars have proposed Long Short Term Memory (LSTM) networks, which are improvements to RNNs. LSTM can effectively solve RNN problems during training and perform better in longer time series data, and LSTM has been applied in load prediction for power systems. Bi-directional LSTM (BilSTM) is a combination of forward and backward LSTM and can fit data from both the forward and backward directions of the sequence to achieve higher prediction accuracy. Convolutional Neural Networks (CNNs) can extract and create features through convolutional and pooling layers. Also, CNN can be used in short term load prediction to extract or create load features. To better handle the complex influencing factors of IES and the time series and non-linear characteristics of load, we propose a CNN-LSTM-BilSTM load prediction model based on the attention mechanism.

Disclosure of Invention

In order to overcome the problems in the related art, the disclosed embodiments of the present invention provide a method for predicting short-term load of an integrated energy system based on attention mechanism. The technical scheme is as follows:

according to a first aspect of the disclosed embodiments of the present invention, there is provided an attention-based method for predicting short-term load of an integrated energy system, comprising:

step 1: preprocessing input data;

step 2: establishing a CNN-LSTM-BILSTM model based on an attention system:

and step 3: and training the model and predicting the short-term load of the comprehensive energy system according to the training model.

In one embodiment, in the input data preprocessing step, it is specifically,

step 1.1: performing characteristic normalization on input data;

step 1.2: serializing the data obtained in step 1.1;

step 1.3: taking 80% of the data set obtained in the step 1.2 as a training set, taking 20% of the data set as a test set, and using the data of the training set to train the model; and the test set is used for testing the trained model, and the error on the test set is used as the generalization error of the final model in the corresponding real scene to distribute the data proportion to the training set and the test set.

In one embodiment, in the step 1.1 of performing feature integration on the input data, a feature normalization formula is used:

wherein a is_iIs the ith data to be normalized, a_imax,a_iminIs the maximum minimum value before normalization, a_niIs the normalized data.

In one embodiment, said step 1.2: serializing the data obtained in step 1.1, specifically,

the method comprises generating input non-time sequence data into time sequence features by using a sliding window method, expressing the load and influence factor value at a certain moment by using related time sequence features, expressing the historical load at each moment by using related features together, and realizing the data proportion distribution of a training set and a test set, wherein the sliding window method specifically comprises the following steps of,

setting the width of a sliding window as an hourly record in T days and setting the step length by taking the hourly electric load, the temperature, the cooling load and the air consumption of the current time for T days as input characteristics of a model; and gradually sliding a window, and jointly representing the historical load at each moment by the relevant characteristics of T days.

In one embodiment, the step 2 of establishing the attention-based CNN-LSTM-BILSTM model comprises, in particular,

step 2.1: and a prediction precision improving step: the convolutional neural network model is used for extracting input data characteristics, and a batch normalization module is arranged between each convolutional layer of the convolutional neural network model, so that the numerical value output by the whole neural network in the middle of each layer is more stable, the deep neural network is easier to converge, the risk of overfitting of the model is reduced, and the network training efficiency is improved;

step 2.2: training attention promoting step: an attention mechanism module is added between the convolutional layer and the long-time memory module, and high-value features are further extracted through the characteristic that the attention mechanism module distributes weights, so that the prediction of the LSTM-BiLSTM in a longer time sequence is more accurate; and predicting the extracted features through two layers of long-time and short-time memory and one layer of bidirectional long-time and short-time memory.

In one embodiment, said step 2.1: the batch normalization formula of the batch normalization module in the prediction precision improving step is as follows:

where xi is the input value for the current layer; yi is the output after batch regularization; m represents the number of samples in a batch; uB is the average of the input values;

is the variance between the input values; epsilon is a smoothing term used to avoid dividing by 0 in the case of very small variance;

is the value after xi batch regularization. The parameters γ and β are obtained by back propagation training.

In one embodiment, the output formula of the attention mechanism module for training the attention boosting step in step 2.2 is as follows:

α_ijthe weight distributed for the attention mechanism as the characteristic can be obtained through network training; c. C_iAn output representing a mechanism of attention; h is_jRepresenting a global feature.

In one embodiment, the extracted features are predicted by two layers of long-short term memory and one layer of bidirectional long-short term memory, which specifically,

establishing an LSTM model:

the equations for the different pixels in the LSTM are as follows:

in the above formula, x_tRepresenting the t-th time series value input in the LSTM; c. C_tRepresenting the current cellular state of the LSTM neurons;

a new candidate for the input forgetting gate generated for tanh; h is_tRepresenting the hidden state of the LSTM neuron; i.e. i_t,f_t,o_tRepresenting the t-th input gate, the forgetting gate and the output gate. W_xi,W_xf,W_xo,W_xcRespectively representing the weights of the input gate, the forgetting gate, the output gate and the memory unit. W_hi,W_hf,W_ho,W_hcRepresenting weights from the hidden layer to the input gate, the forgetting gate, the output gate, and the memory cell; b_i,b_f,b_o,b_cRespectively representing the offset of the input gate, the forgetting gate, the output gate and the memory cell. Tan h represents a hyperbolic tangent activation function, and an indicates a point-by-point multiplication. In order to obtain the optimal parameters, both CNN and LSTM may use back propagation method to adjust the parameters of the model during training;

establishing a BilSTM model:

bidirectional long-term short-term memory (BilSTM) is a combination of forward LSTM and backward LSTM, and can predict data bidirectionally from the positive and negative directions of time, and then the outputs are connected. LSTM can only model and predict data from the forward direction of time, but modeling only from the forward direction of time ignores information that is reverse in time. BilSTM adds an inverse LSTM, so that BilSTM can capture patterns that LSTM may ignore.

In one embodiment, the training model and the step of predicting the short-term load of the integrated energy system based on the training model may be, in particular,

step 3.1: setting a training loss function as a Root Mean Square Error (RMSE), setting iteration times, and selecting an Adam optimizer to optimize parameters in the network;

step 3.2: inputting the preprocessed data into a model, obtaining a final model after training is finished, and predicting the short-term load of the comprehensive energy system by using the model.

In one embodiment, the Root Mean Square Error (RMSE) formula is:

wherein y is_iIs the value predicted by the prediction network,

is the true value, and M is the number of predicted results.

The technical scheme provided by the embodiment of the invention has the following beneficial effects: for a comprehensive energy supply area containing multiple loads, the influences of temperature, cold load, historical electric load data and natural gas load are fully considered, and a CNN-LSTM-BiLSTM load prediction model based on an attention system is obtained by applying CNN, the attention system, LSTM and a BiLSTM model;

preprocessing historical data such as temperature, natural gas load, cold load, electric load and the like, inputting the data into a model for prediction, and obtaining a comprehensive energy system short-term load prediction method based on an attention mechanism;

the method is suitable for short-term load prediction of the comprehensive energy system, can fully mine useful information in historical data, and has better prediction accuracy compared with the traditional method. Provides technical support for the operation optimization of comprehensive energy sources and has great practical value.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a diagram of a CNN-LSTM-BILSTM model architecture based on an attention mechanism according to the present invention;

FIG. 2 is a diagram of the structure of the BilSTM according to the present invention;

FIG. 3 is a schematic representation of the serialization of features according to the present invention;

FIG. 4 is a schematic diagram of a training process of the short-term load forecasting method of the integrated energy system based on the attention mechanism;

FIG. 5 is a schematic diagram of the training results of the short-term load forecasting method of the integrated energy system based on the attention mechanism according to the present invention;

FIG. 6 is a comparison of various models versus a test set daily load prediction;

FIG. 7 is a flowchart illustrating steps of a method for short term load forecasting of an integrated energy system based on an attention mechanism according to the present invention;

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The technical scheme provided by the embodiment of the invention relates to a comprehensive energy system short-term load prediction method based on an attention mechanism, in particular to the field of energy load prediction. In the related art, conventional power system load prediction usually considers weather, economy, electricity price and holiday factors. For the IES, the influence of cold (heat) source, natural gas, solar energy, etc. on the system load needs to be considered. Furthermore, due to the time series and non-linearity of the load, and the power system is greatly affected by the demand side, it is difficult to predict the long-term load of the IES. The short-term load forecast refers to load of each hour or each day after load forecast according to historical data of previous days, and provides theoretical basis for the IES to make a power generation plan. Based on the above, the comprehensive energy system short-term load prediction method based on the attention mechanism provided by the technical scheme of the disclosure considers the influence of temperature, cold load, electric load data and natural gas load on the electric load of the comprehensive energy system, can fully mine useful information in historical data, provides technical support for operation optimization of comprehensive energy, and has great practical value.

Fig. 7 is a flowchart illustrating steps of a method for predicting short-term load of an integrated energy system based on attention mechanism according to the disclosed technical solution.

Step S01 input data preprocessing

The specific process is as follows: historical data such as temperature, cold load, gas consumption, power load and the like of june to september in a certain park are used as input data, feature normalization is carried out on the input data, and errors caused by too large difference among the data are prevented. The data is serialized, the input non-time-series data is generated into time-series characteristics by a sliding window method, the load and the influence factor value at a certain moment are represented by related time-series characteristics, and the historical loads at all moments are represented by the related characteristics together. 80% of the data set obtained after the serialization is used as a training set, and 20% is used as a test set. The data of the training set is used for training the model; and the test set is used for testing the trained model, and the error on the test set is used as the generalization error of the final model in the corresponding real scene to distribute the data proportion to the training set and the test set.

A characteristic normalization formula:

The window sliding method comprises the following steps:

through multi-parameter fine adjustment experiments, when the historical data input for nearly 5 days are selected, the trained model has the highest prediction precision, so that the input is the power load, the temperature, the cold load and the gas consumption per hour for nearly 5 days. The sliding window width is set to record hourly over 5 days and the step size is set to 1. Gradually sliding the window, and collectively representing the historical loads at each time by the 5-day related characteristics thereof, as shown in fig. 3:

step S02: a CNN-LSTM-BILSTM model based on the attention mechanism is established, as shown in FIG. 1.

The specific process is as follows: the CNN is designed into three one-dimensional convolution layers, and the number of the convolution kernels is set to be 16, 32 and 64 respectively. The co nvolutional kernel size is set to 2 and the tapering step is set to 1. Crossing the convolution kernel over the features can extract valid features. After the convolutional layer, a Max Pooling layer was added, the firing window size was set to 2, and the striping step was set to 1. The MaxPholing layer can reduce the complexity of the features and avoid model overfitting. And a batch normalization module is added between each convolution layer of the convolutional neural network, so that the output of each convolution layer is limited in a proper range, and the network training efficiency is improved. And adding an attention mechanism module before the features are input into the LSTM-BilSTM, and further extracting high-value features by the characteristic that the attention mechanism module distributes weights, so that the LSTM-BilSTM can perform better in a longer time sequence. And predicting the extracted characteristics through two layers of long-time and short-time memories and one layer of bidirectional long-time and short-time memories to obtain the power load of the next hour.

(a) Batch normalization formula:

where xi is the input value of the current layer and yi is the output after batch regularization; m is expressed as the number of small batch entries. u. of_BIs the average of the input values and is,

is the variance between the input values, and epsilon is a smoothing term to avoid dividing by 0 in the case of very small variances.

Are values after xi batch regularization the parameters γ and β are obtained by back propagation training. After batch normalization, the output is limited within a proper range, and the training efficiency of the network is greatly improved.

(b) Attention output formula:

α_ijthe weights assigned to features for attention mechanism can be derived through network training. c. C_iIndicating the output of the attention mechanism. h is_jRepresenting a global feature. The attention mechanism is divided into soft attention and hard attention, and the present invention adopts soft attention.

(c) Long-short memory (LSTM) and bidirectional long-short memory (BiLSTM) models:

LSTM model:

the LSTM introduces functions of a forgetting gate, an input gate and an output gate to effectively avoid the problems of gradient explosion and gradient disappearance in the RNN training process. Three gate functions can effectively solve the RNN training problem. The memory unit can control the conversion of different time information. The input gate determines the current time to pass on to the next message. The forget gate indicates how much of the information the current neuron retained was delivered by the previous neuron. The output gate determines the output of the current state to the next state. The equations for the different picture elements in the LSTM are as follows.

In the above formula, x_tRepresenting the t-th time series value entered in the LSTM. c. C_tIndicating the current cellular state of LSTM neurons.

The input generated for tanh forgets the new candidate of the gate. h is_tIndicating an LSTM neuron hidden state. i.e. i_t,f_t,o_tRepresenting the t-th input gate, the forgetting gate and the output gate. W_xi,W_xf,W_xo,W_xcRespectively representing the weights of the input gate, the forgetting gate, the output gate and the memory unit. W_hi,W_hf,W_ho,W_hcRepresenting the weights from the hidden layer to the input gate, the forgetting gate, the output gate and the memory cell. b_i,b_f,b_o,b_cRespectively representing the offset of the input gate, the forgetting gate, the output gate and the memory cell. Tan h represents a hyperbolic tangent activation function, and an indicates a point-by-point multiplication. To obtain the optimal parameters, either CNN or LSTM can use back-propagation methods to adjust the model parameters during the training process.

BilSTM model:

bidirectional long-term short-term memory (BilSTM) is a combination of forward LSTM and backward LSTM, and can predict data bidirectionally from the positive and negative directions of time, and then the outputs are connected. LSTM can only model and predict data from the forward direction of time, but modeling only from the forward direction of time ignores information that is reverse in time. BilSTM adds an inverse LSTM, so that BilSTM can capture patterns that LSTM may ignore. The structure of BilSTM is shown in FIG. 2.

Step S03: and (5) training and predicting the model.

The specific process is as follows: the training loss function is set to Root Mean Square Error (RMSE), the number of iterations is set to 200, the batch size is set to 128, and the Adam optimizer is selected to optimize parameters in the network. The training process is shown in fig. 4. Inputting the preprocessed data into a model, obtaining a final model after training is finished, and performing short-term load prediction on the comprehensive energy system by using the model, wherein the result is shown in fig. 5.

Root Mean Square Error (RMSE) equation:

wherein y is_iIs the value predicted by the prediction network,

is the true value, and M is the number of predicted results.

By combining all the technical schemes, the invention has the advantages and positive effects that: the model is superior to other models in prediction accuracy, after 10 times of repeated calculation, compared with CNN-BilSTM, CNN-LSTM, LSTM, Back Propagation Neural Network (BPNN), Random Forest Regression (RFR) and support vector machine regression (SVR), indexes such as Mean Square Error (MSE), Mean Absolute Error (MAE) and R2 score of the prediction result are compared, and the result is shown as follows.

TABLE 1 comparison of the results of the different models

As shown in FIG. 6, FIG. 6 is a comparison graph of various models versus load prediction of the test set on a day, and from 10 calculations, it can be seen that LSTM-BilSTM is superior to LSTM, which indicates that LSTM-BilSTM is more suitable for load prediction in IES. Furthermore, models using CNN are superior to models that do not, indicating that CNN can efficiently extract and create features to improve the performance of predictive models. Furthermore, the performance of the attention-based CNN-LSTM-BilSTM, CNN-BilSTM, LSTM is superior to RFR and SVR, indicating that the structure of deep learning is superior to the traditional machine learning model. The CNN-L STM-BilSTM model based on attention performs best in the aspects of MAE, RMSE and R2 scores, and the superiority of the model is proved.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. The method for predicting the short-term load of the comprehensive energy system based on the attention mechanism is characterized by comprising the following steps of:

step 1: preprocessing input data;

step 2: establishing a CNN-LSTM-BILSTM model based on an attention system:

2. The method for integrated energy system short-term load prediction based on attention mechanism according to claim 1, characterized in that in the input data preprocessing step, which is specifically,

step 1.1: performing characteristic normalization on input data;

step 1.2: serializing the data obtained in step 1.1;

step 1.3: and distributing data proportion to the training set and the test set.

3. The method for forecasting short-term load of an integrated energy system based on an attention mechanism as claimed in claim 1, wherein the step 1.1 utilizes a characteristic normalization formula in the characteristic integration of the input data:

4. The integrated energy system short-term load forecasting method based on attention mechanism as claimed in claim 1, characterized in that the step 1.2: serializing the data obtained in step 1.1, specifically,

generating the input non-time series data into time series characteristics by using a sliding window method, representing the load and the influence factor value at a certain moment by using related time series characteristics, commonly representing the historical load at each moment by using the related characteristics,

the sliding window method, which is specifically,

5. The method for forecasting the short-term load of the integrated energy system based on the attention mechanism as claimed in claim 1, wherein the step 2 of establishing the CNN-LSTM-BILSTM model based on the attention mechanism comprises the steps of,

6. The integrated energy system short-term load forecasting method based on attention mechanism as claimed in claim 1, characterized in that the step 2.1: the batch normalization formula of the batch normalization module in the prediction precision improving step is as follows:

wherein x_iIs the input value of the current layer; y is_iIs the output after batch regularization;

m represents the number of samples in a batch; uB is the average of the input values;

is the variance between the input values;

epsilon is a smoothing term used to avoid dividing by 0 in the case of very small variance;

is the value after xi batch regularization; the parameters γ and β are obtained by back propagation training.

7. The method for integrated energy system short term load prediction based on attention mechanism of claim 6, characterized in that the output formula of the attention mechanism of the step 2.2 training attention mechanism module of the attention boost step is:

α_ijis to be notedThe gravity mechanism is a weight distributed to the features and can be obtained through network training; c. C_iAn output representing a mechanism of attention; h is_jRepresenting a global feature.

8. The method for forecasting the short-term load of the integrated energy system based on the attention mechanism as claimed in claim 6, wherein the extracted features are forecasted by two layers of long-term and short-term memory and one layer of bidirectional long-term and short-term memory, which is specifically,

establishing an LSTM model:

the equations for the different pixels in the LSTM are as follows:

a new candidate for the input forgetting gate generated for tanh; h is_tRepresenting the hidden state of the LSTM neuron; i.e. i_t,f_t,o_tRepresenting the t-th input gate, the forgetting gate and the output gate. W_xi,W_xf,W_xo,W_xcRespectively representing the weights of the input gate, the forgetting gate, the output gate and the memory unit. W_hi,W_hf,W_ho,W_hcRepresenting weights from the hidden layer to the input gate, the forgetting gate, the output gate, and the memory cell; b_i,b_f,b_o,b_cRespectively representing the offset of the input gate, the forgetting gate, the output gate and the memory cell. Tan h represents a hyperbolic tangent activation function, which indicates a point-by-point multiplication;

establishing a BilSTM model:

9. The attention mechanism-based short-term load forecasting method of an integrated energy system according to claim 1, wherein in the step of training the model and forecasting the short-term load of the integrated energy system based on the training model, specifically,

10. The integrated energy system short-term load prediction method based on attention mechanism of claim 1, characterized in that the Root Mean Square Error (RMSE) formula is:

wherein y is_iIs the value predicted by the prediction network,

is the true value, and M is the number of predicted results.