CN114529051A

CN114529051A - Long-term power load prediction method based on hierarchical residual self-attention neural network

Info

Publication number: CN114529051A
Application number: CN202210048738.XA
Authority: CN
Inventors: 占翔昊; 寇亮; 张纪林; 周丽; 袁俊峰
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-05-24

Abstract

The invention discloses a long-term power load prediction method based on a hierarchical residual error self-attention neural network. The invention comprises three parts: firstly, mixed characteristic data of a trend item, a period item, a holiday item and a weather item in historical load data are extracted in a self-adaptive mode and fused with the historical load sequence data. Secondly, carrying out time component recursive decomposition on the fused sequence data, encoding the time component by utilizing a hierarchical residual self-attention network block, and thirdly, reconstructing the time component, carrying out generative decoding and predicting the power load fluctuation in a period of time in the future. According to the invention, the load sequence is disassembled, reconstructed and predicted in a layering manner, the long-term and short-term characteristics of the sequence are effectively captured, and the prediction precision of the model in a long-sequence load prediction scene is improved.

Description

Long-term power load prediction method based on hierarchical residual self-attention neural network

Technical Field

The invention relates to the technical field of load prediction of a power energy system, in particular to a long-term power load prediction method based on a hierarchical residual error self-attention neural network

Background

The power load prediction technology is an indispensable part of services in the composition of an intelligent power grid system, is actively applied to a plurality of scenes, and how to effectively control the power load to achieve balance of supply and demand becomes an important research direction in the operation and management of a modern power system. The core problem of load prediction is how to obtain the historical change rule of the prediction object and the relation between the historical change rule and some influence factors, the prediction model is actually a mathematical function expressing the change rule, and the challenge of load prediction is that the load prediction is influenced by a plurality of external factors, including power trading market factors, national policy factors, weather factors, residential electricity habit factors and the like, which are problems to be solved

Models for load prediction can be essentially categorized as mathematical models for time series prediction, and common methods can be divided into: traditional statistical methods, machine learning based methods, deep learning based methods, and third party tool prediction methods. (1) Based on the traditional statistical methods, the common time series models including Auto Regression model (AR), Auto Regression Moving Average model (ARMA), etc. have simple principles, are suitable for analyzing the stable sequences and simple non-stable sequences under a small number of orders, but are not suitable for solving the nonlinear prediction scenes, (2) based on the machine learning method, the machine learning is a very large gate class, wherein a plurality of models suitable for solving the nonlinear prediction are available, the common model includes Support Vector Machines (Support vectors, SVM), decision tree models, K neighbor models, etc., and even integrated learning models (XGBoost, LightGBM), etc. with better prediction capability, the machine learning model well solves the nonlinear problem, but is subject to the characteristic mining capability under a large number of levels and high-dimensional data prediction scenes, often, the data features are manually processed to build a machine learning prediction model. (3) Based on the deep learning method, the deep learning model can adaptively mine and learn data characteristics due to strong fitting ability, and is very suitable for solving the problem of nonlinear prediction, common methods include Convolutional Neural Networks (CNN), Long Short Term Memory Networks (LSTM), gated-round units (GRU), and the like, wherein the Recurrent Neural Networks represented by LSTM and GRU are widely used in sequence modeling and have good sequence ability, but the Recurrent Neural Networks gradually lose learning ability to Long-distance historical characteristics in the training process due to serial learning and have error accumulation phenomenon, so the Recurrent Neural Networks are often used together with other deep learning models, (4) third-party tool prediction method, in recent years, some large-scale domestic and foreign companies also open time sequence prediction methods of self-research thereof, for example, FaceBook has launched a Prophet model in 2017, the model comprehensively considers trend terms, period terms and holiday terms of time series, the model is simple to use and has stable prediction capability, then Amazon has launched a deepAR model in 2018, the model uses an autoregressive reasoning mode based on probability, uncertainty in the prediction process is reduced, the prediction accuracy of the tools is remarkable, but only short-term prediction can be achieved, and the method is not suitable for energy load scenes with high real-time performance and strong stability

Disclosure of Invention

The invention aims to combine the prior art and improve the prior art to optimize the modeling effect of a load prediction model in a power load prediction scene, specifically, the invention uses a neural network mode to model, and provides a network structure based on a hierarchical residual error self-attention mechanism, which is used for long-sequence prediction of stable and highly periodic power load data

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

a long-term power load prediction method based on a hierarchical residual self-attention neural network comprises the following steps:

step 1, acquiring source data of unit load sequence and weather data monitored by a sensor from a time sequence database

And 2, cleaning the source data, extracting features from the cleaned historical load data and weather data, respectively extracting four features of a trend item, a period item, a holiday item and a weather item of load fluctuation, fusing the historical load sequence data and the feature data to obtain a fusion vector, and inputting the fusion vector for the next neural network modeling

Step 3, the hierarchical residual error self-attention neural network provided by the invention is used for coding the input sequence, extracting and mining important characteristics in the input sequence, and performing model training

And 4, carrying out generative coding on the characteristics extracted from the source historical load data to be predicted, and predicting the load sequence in the next time step range.

The invention has the beneficial effects that: the model provided by the invention is based on a Transformer neural network, a self-attention mechanism is used in the network structure, and compared with the traditional recurrent neural network, the model has the capability of capturing global features. Compared with the traditional method, the method is more excellent and flexible in feature mining capability and model generalization capability, the prediction error of the medium-term and long-term load prediction can be well reduced by realizing the load prediction through the method, feedback guidance is provided for operation and allocation of the power unit, and the stable operation of the power system is ensured.

Drawings

Fig. 1 is a schematic flowchart of a long-term power load prediction method based on a hierarchical residual error self-attention neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an overall framework of a hierarchical residual error-based self-attention neural network prediction model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a framework of a transform neural network model;

FIG. 4 is a block diagram of a residual neural network model;

FIG. 5 is a block diagram of an embodiment of the present invention in which each layer of the modified residual self-attention block is framed;

FIG. 6 is a block diagram of prediction using generative decoding according to an embodiment of the present invention;

Detailed Description

The invention is further explained with reference to the attached drawings, the flow chart of the implementation of the invention is shown in figure 1,

step 1. determining a start time T_startAnd an end time T_endReading the load data X in the appointed time range from the database storing the unit load sequence by using middleware service or data analysis software_rawSimilarly, the weather data X collected by the sensor is read_weatherAnd skipping to the step 2.

Step 2, extracting sequence characteristic data from the historical load data and the weather data and performing data fusion with the source historical load data, wherein the method comprises the following steps:

and 2-1, extracting weather data characteristics.

The method comprises the steps of coding weather data collected by a sensor, analyzing the data of the collected data at least including temperature data, weather state data, timestamp data and the like, and eliminating the abnormal condition with excessive deviationData, to temperature data X_weather(T) performing a maximum-minimum normalization, wherein the normalization function is expressed as:

similarly, for other numerical weather related data, the method can be used for carrying out feature normalization to effectively help subsequent feature fusion, and for weather state data X_weather(S) is often a category type tag, shaped as [ sunny, cloudy, light rain, heavy rain, light snow, …]For such data, a one-hot encoding method is adopted to convert the data into data of a numerical type, specifically, each tag is encoded into a unique numerical value, and the one-hot encoding is expressed as follows:

status of state

In sunny days

Cloudy

Light rain

Heavy Rain

Small snow

……

Encoding a value

0

1

2

3

4

……

The characteristic processing of the weather data can be realized through the method.

Step 2-2, extracting trend item characteristics and periodic item characteristics of historical load sequence

The historical load sequence characteristics are main factors influencing the trend of a future sequence, the time sequence is subjected to characteristic decomposition through a shallow neural network aiming at the nonlinear and time-varying characteristics of the time sequence, data cleaning is required to be carried out before decomposition, the data is analyzed, the value with overlarge offset is eliminated, specifically, a mixed sequence decomposition layer neural network is defined, and the original input is X_inputThe trend term feature and the period term feature may be generated according to the following process:

X_trend＝MovingAvg(X_input)

X_period＝X_input-X_trend

wherein MovingAvg is a moving average function, obtained by using an average pooling operation of one-dimensional convolution, through which a trend term of the whole sequence fluctuation can be obtained, and then a period term can be obtained by subtracting the trend term from the original sequence

Step 2-3, extracting the holiday term characteristics of the historical load sequence

In load prediction, the presence of important holidays also affects the trend of the load to some extent, specifically, the time stamp X for the original load data extracted in step 1_timestampAnalyzing the data by using the pandas and numpy libraries of python language, and calculating the expansion characteristics of the date of each time stamp, including the month X of the date_monthDay number X_dayHour X_hourMinute X_minuteDay of week X_weekdayWhether it is a workday X_isworkWhether it is a holiday X_isholidayWhether it is a double breakDay X_isweekendAnd (4) analyzing the time by using a DataFrame library of the pandas, and expressing the characteristics of finer granularity as follows

X_month，X_day，X_hour，X_minute，...＝Extend(X_timestamp)

X_timestamp＝Linear(Extend(X_timestamp))

Wherein Extend is a feature extension function, and converts the extended multidimensional features into a data form with the same dimension as a source sequence through a nonlinear conversion layer for subsequent feature fusion

Step 2-4. feature embedding fusion

Through the first three steps, step 2-1, 2-2, 2-3, the existing feature data set, X, can be obtained_weather，X_trend，X_period，X_trend，X_timestampThese features are then fused, here using an additive model, for subsequent hierarchical residual neural network inputs, which are expressed as:

dropout is a common neuron inactivation rate function in neural network modeling, and aims to prevent the occurrence of overfitting, RELU is a common activation function, and finally, fused features can be obtained through an additive model

Step 3, the hierarchical residual error self-attention neural network provided by the invention is used for coding the input sequence, extracting and mining important features in the input sequence, the overall schematic diagram of the model is shown in fig. 2, and in the embodiment, the step 3 specifically comprises the following sub-steps:

and 3-1, decomposing the sequence characteristics.

The invention has the major innovation that a layered decomposition sequence modeling process is used for replacing the traditional linear modeling process, the characteristic sequence is subjected to recursive decomposition continuously according to the number of layers, then a residual self-attention network is used for modeling decomposition characteristics of each layer, and finally better characteristic expression can be trained in a deeper layer, specifically, a decomposition algorithm provided by the invention comprises parity decomposition and dichotomy decomposition, wherein pseudo code expression of the algorithm is as follows:

wherein the content of the first and second substances,

the method is characterized in that the method is a mixed characteristic sequence of source input, Level is the number of preset layers, Splitseries is a sequence decomposition function, a default algorithm provided by the invention adopts dichotomy decomposition, and two decomposed characteristic components X are obtained_left,X_rightRespectively input into the residual block to be updated to obtain

Then, continuing to adopt the algorithm 1 to carry out recursive decomposition until reaching the limit of the number of layers, and finally returning the combined sequence by using the Merge function

Step 3-2, extracting information of characteristic components by using hierarchical residual error self-attention neural network

The prototype of the hierarchical residual self-attention neural network provided by the invention is a Transformer network, the architecture diagram of which is shown in fig. 3, specifically, the invention uses a self-attention mechanism, compared with LSTM and GRU, the network has the potential of mining the dependency between time sequences, and the self-attention mechanism emphasizes the global state and better prevents the information loss. The method comprises the steps of transforming an original Transformer in consideration of training time and prediction accuracy, specifically, replacing a feedforward neural network in an original Transformer encoder with a convolutional network with a smaller parameter, adding more cross-layer residual errors to stabilize gradient change during model training aiming at a hierarchical structure proposed by the design, simplifying a Transformer decoder layer, replacing the basic structure with a combination of a full connection layer and a Gaussian error function, and enabling an integrally modified framework to be as shown in FIG. 5

At each layer, the characteristic component X is divided into_inputInputting the time sequence characteristic information into a model to obtain time sequence characteristic information X with time sequence dependency_depExpressed as:

X_dep＝ResidualAttentionBlock(X_input)

the step 3-2 specifically comprises the following steps:

step 3-2-1, dividing each layer into single time characteristic component X_inputInputting the residual error into a multi-head residual error self-attention block to obtain a coded characteristic X_emdedThe multi-headed residual self-attention mechanism is expressed as:

ResidualMultiHead(H)＝Concat(head₁，head₂，...head_n)W_o

wherein, ResidualMultiHead represents multi-head residual error self-attention layer, H represents the number of attention heads, W_oRepresenting weight vectors, i.e. non-linearly transforming the fused feature vectors of the plurality of headers to map to a specified length, head₁，head₂，...head_nRepresenting the output from the attention layer for each head, Concat is a tensor splicing function, and the computation for each head is expressed as follows:

wherein

Q_i，K_i，V_iIs obtained by non-linear conversion after encoding the input data in each head, Prev_iThe probability matrix calculated by the multi-head self-attention layer of the previous layer is transmitted to the next layer, stable and excellent performance can still be obtained under a deep network structure, and the final fusion characteristic X is obtained by using a plurality of heads_attnThese variables are represented as follows:

step 3-2-2: inputting the output characteristics of the multi-head self-attention Layer into a first Layer regularization Layer1 to generate a characteristic vector X_norm1And generates a copy X thereof_norm2Is mixing X_norm1Inputting the code vector X into a second layer one-dimensional convolution network to obtain a code vector X_convIs mixing X_convAnd X_norm2Connecting, generating a coded time characteristic component Z which is transmitted to the next Layer of the self-attention Layer through a second Layer regularization Layer2, and simultaneously calculating a probability matrix Prev in the step 3-2-1_iAlso passed to the next layer, the relative expression is as follows:

X_norm1＝NormalizationLayer1(X_attn)

X_norm2＝X_norm1

X_conv＝Dropout(Relu(Conv1d(X_norm1)))

Z＝NormalizationLayer2(X_conv+X_norm2)

step 3-2-3. repeat steps 3-2-1 and 3-2-2, using the same operation in the residual attention unit of each layer stack in the encoder section in the hierarchical residual block.

And 3-2-4, inputting the vector Z finally coded by the coder into a decoder for decoding, wherein the decoder is improved from a traditional transform structure and is properly simplified, and the expression is as follows:

Z＝Gelu(Linear(Dropout(Z)))

wherein, Dropout is a hyper-parameter, represents the neuron deactivation rate in the neural network, and plays a role of preventing overfitting, Linear is a simple nonlinear transfer function, GELU is a Gaussian error Linear unit which has better performance in sequence modeling, and the comprehensive performance is the most excellent under a plurality of scenes, and the expression mode is as follows:

the time component Z is transmitted to the next layer residual error self-attention block by the time component characteristic after decoding by the decoder and having very good context expression capability.

Step 3-2-5. the steps 3-2-1, 3-2-2, 3-2-3, 3-2-4 are cycled until the sequence can not be divided (the requirement of layer number is achieved)

3-2-6, time series reconstruction, namely through the steps from 3-2-1 to 3-2-5, the original time component characteristics are already segmented into a plurality of time component characteristics with the same length, and reduction is carried out according to the relative position sequence of the original characteristics, wherein the following steps are respectively a segmentation and reconstruction algorithm flow adopting an odd-even segmentation strategy and a binary segmentation strategy:

compressing the reconstructed sequence in the above way, and taking the compressed sequence value and the real sequence value as the mean square errorA loss function is used to update parameters of the neural network, thereby training the network, setting the compression length as embed _ len, and using X as the compressed vector_embedTo express

X_embed＝Embed(X_T，embed_len)

Finally, updating model parameters by taking mean-square error (MSE) as a loss function

Wherein

Is a predicted value, X for the training phase_embedIs represented by Y_TTrue value, X for training phase_trueTo represent

Step 4, setting a prediction step size, performing generative decoding, predicting a load sequence in the next time range, specifically, assuming that this step has already acquired the reconstructed sequence feature data to be predicted, it is also necessary to set a compression length embed _ len, where the compression length is smaller than the length sequence _ len of the reconstructed sequence, where the length of the reconstructed sequence is set by default to 96 and the compression length is 48, by which the reconstructed sequence is compressed to the end part of the specified length, as illustrated in fig. 6, the following is an expression of the whole process:

X_embed＝Embed(X_T，embed_len)

after obtaining the compressed sequence, the invention carries out long sequence prediction by proposing a generative decoding mode, initializes the full zero tensor X with the same dimension as the prediction length by setting the prediction length prediction _ len_zeroIs mixing X_embedAnd X_zeroPerforming horizontal splicing, performing compression again, wherein the length of the compression is predicted _ len, and generating load prediction X of the historical sequence_pred：

X_pred＝Embed(Concat(X_embed，X_zero)，predict_len)

The above is the preferred implementation process of the present invention, and all the changes made according to the present invention technique, which produce the functional effects that do not exceed the scope of the present invention technical solution, belong to the protection scope of the present invention.

Claims

1. The long-term power load prediction method based on the hierarchical residual error self-attention neural network is characterized by comprising the following steps of:

step 1, acquiring source data of a unit load sequence and weather data monitored by a sensor from a time sequence database;

step 2, performing data cleaning on the source data, performing feature extraction from the cleaned historical load data and weather data, and respectively extracting four major features of a trend item, a period item, a holiday item and a weather item of load fluctuation;

performing data fusion on the historical load sequence data and the weather characteristic data to obtain a fusion vector for the input of the next neural network modeling;

step 3, encoding the input sequence by using a hierarchical residual self-attention neural network, extracting and mining important features in the input sequence, and performing model training;

step 4, carrying out generative coding on the characteristics extracted from the source historical load data to be predicted, and predicting a load sequence in the next time step range;

extracting integral trend term and periodic term characteristics from the original load sequence by using a convolution neural network; performing feature extraction on the holiday term and the weather term by using a one-hot coding mode, performing horizontal splicing on the source load sequence and all extracted feature data by using an additive idea, and performing conversion through a full connection layer to obtain a fused time sequence feature vector;

step 3, adopting a recursion idea, hierarchically performing feature downsampling decomposition on the time sequence feature vector, performing feature mining on the time sequence component after decomposition of each layer by using a residual self-attention network, recombining the mined features according to the original relative positions on the basis of reaching the decomposition depth, converting the mined features into a prediction result through a one-dimensional convolution layer, continuously iterating according to the mode, using an Adam algorithm as an optimization algorithm, and using a mean square error between a predicted value and a true value as a loss function to perform model training;

and 4, specifically, performing feature conversion on source load data to be predicted through the steps 2 and 3, splicing the converted features with an all-zero vector initialized to the prediction length, performing generative coding on the spliced vector through the model trained in the step 3, and predicting the load sequence fluctuation of the whole section in the future.

2. The long-term power load prediction method based on hierarchical residual self-attention neural network according to claim 1, characterized in that: the period term is obtained by subtracting the trend term from the original load sequence.

3. The long-term power load prediction method based on hierarchical residual self-attention neural network of claim 1, characterized in that: the prototype of the hierarchical residual self-attention neural network is a Transformer network, a feedforward neural network of an encoder in the original Transformer network is replaced by a convolutional network, more cross-layer residual connections are added to stabilize gradient changes during model training, a decoder layer in the Transformer network is simplified, and the combination of a full connection layer and a Gaussian error function is replaced.

4. The long-term power load prediction method based on hierarchical residual self-attention neural network of claim 1, characterized in that: and 4, splicing in the step 4 adopts horizontal splicing and compression.

5. The long-term power load prediction method based on hierarchical residual self-attention neural network of claim 4, wherein: the compressed length is the predicted length.