CN113379164B

CN113379164B - Load prediction method and system based on deep self-attention network

Info

Publication number: CN113379164B
Application number: CN202110807996.7A
Authority: CN
Inventors: 田江; 苏大威; 赵家庆; 吴海伟; 吕洋; 赵奇; 丁宏恩; 俞瑜; 赵慧
Original assignee: State Grid Jiangsu Electric Power Co Ltd; Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Jiangsu Electric Power Co Ltd; Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2024-03-26
Anticipated expiration: 2041-07-16
Also published as: CN113379164A

Abstract

The power load prediction method and system based on the depth self-attention network takes sample data as input and takes a power load predicted value as output, and comprises a self-attention encoder, a history score calculation unit, a position encoder, a Query sequence unit, a spatial attention unit and a temporal attention unit; and the accurate prediction of the load change trend and the load change size in the power system is realized based on the deep learning and the self-attention model. The system establishes a non-autoregressive self-attention neural network for prediction, overcomes the problems of time lag and accumulated errors in the traditional deep learning model, and simultaneously establishes a attention mechanics learning mechanism among multiple variables, so as to realize the time sequence prediction based on multiple variable aggregation and effectively improve the prediction precision. The method and the system can fully utilize mass data acquired in the running process of the power grid to accurately predict the system load, and provide basis for the subsequent scheduling control of the power grid.

Description

Load prediction method and system based on deep self-attention network

Technical Field

The invention relates to the technical field of load prediction of power systems, in particular to a load prediction method and system based on a deep self-attention network.

Background

With the rapid development of power grid construction, the power system is developed to be intelligent and informationized. Power load prediction is an important one of these, and its consequences will have a great impact on the deployment, planning and operation of the power system. Short-term load prediction has important influence on important decisions such as daily operation, scheduling planning and the like of a power grid. Therefore, in order to ensure economic benefit and social benefit, the ability of accurately predicting the power load is important, so that the safety of a power system can be ensured, and meanwhile, a power supply enterprise can be ensured to economically and efficiently make a power generation plan.

In the prior art, short-term power load prediction mainly carries out power consumption prediction from several hours to one day in the future of a power system, because of randomness and nonlinearity of the power load, the difficulty of short-term load prediction is improved, and meanwhile, the load is subjected to multiple influences of environmental factors such as temperature, illumination, wind speed and the like which change in real time and subjective factors of users, so that the complexity of short-term load prediction is increased, the accuracy of short-term load prediction is reduced, and therefore, accurate and rapid short-term load prediction becomes a challenging task.

At present, a great deal of related studies have been conducted in the field of load prediction, wherein classical prediction methods include time series methods, regression analysis methods, and the like. The method has simple implementation principle and high operation speed, and is suitable for processing the data set with simple structure and small scale. However, as infrastructure is continuously perfected and the degree of informatization of the power grid is improved, the scale of users is continuously enlarged, the power data is rapidly increased, and meanwhile, the classical method is not ideal in large-scale data due to the nonlinear characteristics of part of the power data.

Whereas machine learning methods are of interest because of their strong adaptability and non-linear processing capabilities. The deep neural network can approach any function in theory, and has good effects in the fields of image recognition, natural language processing and the like. The self-attention mechanism can calculate the attention of each data in the sequence data to other data and assign different weights to the data, so that the sequence data processed by the self-attention mechanism becomes the sequence data containing the weight information, and the self-attention mechanism has stronger capability of processing the sequence data. The Chinese patent application (CN 110909919A) discloses a photovoltaic power prediction method of a deep neural network model fused with an attention mechanism, which is characterized in that a deep learning algorithm is utilized to model and predict the photovoltaic electric field power, and the attention mechanism is utilized to carry out weighted summation on the depth characteristics extracted from the neural network model, so that the selection of high-quality characteristic information with heavier weight of a prediction result is realized, the accuracy and the stability of the photovoltaic power prediction model are improved, the interference of useless information on the model is reduced, and the calculation time is shortened. The Chinese patent (CN 110355608B) discloses a cutter abrasion loss prediction method based on a self-attention mechanism and deep learning, which utilizes characteristic information related to cutter abrasion in measured data of a self-attention mechanism and a bidirectional long-short-time memory network combined excavation sensor, extracts the dependency relationship among measured data of three special sensors of cutting force, vibration signals and sound signals at different moments, effectively improves the real-time prediction effect of cutter abrasion loss of a numerical control machine tool, and can be applied to the prediction of cutter abrasion loss of the numerical control machine tool in industrial production. The Chinese patent application (CN 112052977A) discloses a reservoir reserves prediction method based on a deep space-time attention network, and the adverse effect of data fluctuation on a prediction result is relieved by combining a cyclic neural network and an attention mechanism, so that the prediction precision is improved. Chinese patent application (CN 110413844 a) discloses a "dynamic link prediction method based on a spatiotemporal attention depth model", in which attention coefficients at each time are calculated, and the normalized attention coefficients are used as weights in the calculation of prediction results.

Disclosure of Invention

In order to solve the defects existing in the prior art, the invention aims to provide a load prediction method and system based on a deep self-attention network.

The invention adopts the following technical scheme.

A load prediction system based on a depth self-attention network takes sample data as input and takes power load predicted values as output, and comprises an encoder and a decoder.

The encoder is a self-attention encoder, the decoder comprising: a spatial attention unit, a temporal attention unit;

the system further comprises: the system comprises a history score calculating unit, a position encoder and a Query sequence unit;

sample data is input into a self-attention encoder; the history score calculating unit obtains a history score according to the output of the self-attention encoder, and the position score calculating unit obtains a position score according to the output of the position encoder; the sample data, the historical score and the position score are input into a Query sequence unit, and a Query sequence is generated by the Query sequence unit;

the Query sequence is input to the spatial attention unit to obtain a spatial attention sequence, the spatial attention sequence and the output of the self-attention unit are input to the time attention unit together, and the time attention unit outputs the power load predicted value.

Preferably, the self-attention unit and the history score calculation unit are connected by adopting a full connection layer; the position encoder and the position score calculating unit are connected by adopting a full-connection layer; and obtaining the power load predicted value after the output value of the time attention unit passes through the full connection layer.

Preferably, the Query sequence unit generates all the Query sequences required by the spatial attention unit and the temporal attention unit in a single step time, and the system performs parallel prediction.

The load prediction method based on the deep self-attention network comprises the following steps:

step 1, collecting time-related original data and space-related original data of power load prediction, and constructing an input data set;

step 2, processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm;

step 3, constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the time attention unit based on the improved transducer network architecture; generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence;

step 4, obtaining a joint attention sequence by using a spatial attention unit and a time attention unit for an input data set in a power load prediction period;

step 5, establishing a deep self-attention network model, taking Adam as an optimizer, and training the deep self-attention network based on sample data;

and 6, inputting the test data into a network and outputting the power load predicted value.

Preferably, in step 1, the time-related raw data includes load history data, regional illumination data, regional wind speed data, and the space-related raw data includes station position data.

Preferably, in step 2, the input data set isWherein the ith data sample satisfies +.>Processing the input data set by adopting a regularization method comprises the following steps:

step 2.1, calculating the ith data sample according to the following relationL of (2) _P Norms->

Wherein, the value range of p is [0, + ] infinity;

step 2.2, based on L _P Norm numberFor the i-th data sample->Regularization is carried out, and the following relation is satisfied:

in the method, in the process of the invention,for the ith data sample in class D, where d=1, 2, …, D represents the total number of classes.

Preferably, in step 3, in a single step time, the nth object O _n Generating a Query sequence, comprising:

step 3.1, calculating a history score H according to the following relation ⁽ⁿ⁾ ：

In the method, in the process of the invention,for the output of the self-attention encoder, W _H As a learnable parameter of historical score, b _H Network learning rate for historical scores;

step 3.2, calculating the position score P according to the following relation ⁽ⁿ⁾ ：

In the method, in the process of the invention,w, which is the output of the position encoder _p A learnable parameter, b, being a location score _p Network learning rate for location score;

step 3.3, integrating the history score and the location score as a Query sequence Q ⁽ⁿ⁾ Is based on the sample data set as an input sequenceAt the time, query sequence Q ⁽ⁿ⁾ The following relation is satisfied:

wherein P is ⁽ⁿ⁾ For position score, H ⁽ⁿ⁾ Is a historical score.

Preferably, step 4 comprises:

step 4.1, for the nth object O during the power load prediction period _n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising power consumption which varies with time in various electric equipment and spatial related data set>The power consumption of the movable electric equipment along with the change of the position is contained;

step 4.2 for time dependent data setsCalculating the time attention weight +.>

In the method, in the process of the invention,

softmax (·) is the activation function, mapping the network output into (0, 1) intervals,

wherein,for feature vectors defined based on input features, defined as +.>And the output of the position encoder->The linear superposition is obtained, and the following relation is satisfied:

and->For the inherent calculation intermediate variable of deep learning attention mechanism, all decoding results are obtained by linear transformation combination, and the same self-attention mechanism is based on +.>And->The values are the same, and the following relational expression is satisfied:

W _tQ characterization of first time network learning parameters for an attention networkAnd->Is used for the degree of similarity of (c) to (c),

W _tK characterization for second time network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),

d _model dimension for the input variable;

step 4.3 for spatially correlated data setsCalculating the spatial attention weight +.>

In the method, in the process of the invention,

for a feature vector defined based on input features, it is defined as a spatially dependent dataset +.>And>and->The same value is taken, and the following relation is satisfied:

W _sQ characterization for first spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),

W _sK characterization for second spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),

d _model dimension for the input variable;

step 4.4, utilizing the time attention weightAnd spatial attention weight->The joint attention weight is calculated by the following relation>

Step 4.5, orderObtaining a power grid load prediction output value according to the following relation:

the resulting predicted output value is a fixed length joint attention sequence determined from the prediction period.

Preferably, step 5 comprises:

and 5.1, smoothing the L1 loss function, wherein the following relation is satisfied:

in the formula g ^(t，i) To train the corresponding true value of the sample, z ^(t，i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 _s (. Cndot.) is to smooth the L1 loss function;

step 5.2, constructing the objective function L by the following relation _o ：

Step 5.3, using the objective function L _o With Adam as an optimizer, the deep self-attention network is trained based on sample data.

Preferably, step 6 further includes calculating a root mean square error for the output power load prediction value, and using the root mean square error as an evaluation value of the power load prediction accuracy.

Compared with the prior art, the method has the advantages that the time lag and accumulated error problems are overcome, meanwhile, the system constructs a mechanism of attention and mechanics among multiple variables, the time sequence prediction based on multiple variables is realized, and the prediction precision is effectively improved.

The beneficial effects of the invention also include:

1. the improved transducer network is used in a load prediction system based on a deep self-attention network to realize the prediction of a multidimensional time sequence by introducing a Query generation unit; only one network model is used, so that the structure of the prediction system is simplified, and the calculation speed is improved;

2. the Query generation unit is inserted between the self-attention unit and the time-attention unit and between the time-attention unit and the space-attention unit, all the required Query sequences can be generated in a single step time, so that the system can realize parallel prediction and has higher prediction efficiency;

3. meanwhile, the time correlation attention and the space correlation attention are calculated, so that load data used for load prediction not only have time characteristics but also have geographic characteristics, and a prediction result is more accurate and reliable;

4. the attention sequence with the fixed length is calculated according to the power load prediction demand, so that the traditional method for calculating the attention data at a certain moment is broken through, and the application range of the prediction result is wider.

Drawings

FIG. 1 is a schematic diagram of a load prediction system based on a deep self-attention network according to the present invention;

wherein reference numerals are as follows:

1-self-attention encoder

2-a history score calculating unit;

a 3-position score calculation unit;

4-position encoder;

5-Query sequence unit

6-a spatial attention unit;

7-a time attention unit;

8-Query sequence;

FC-full connection layer units;

FIG. 2 is a flow chart of a deep self-attention network based load prediction method of the present invention;

FIG. 3 is a diagram of a load prediction process based on a deep self-attention network in accordance with an embodiment of the present invention;

FIG. 4 is a graph of winter load prediction interval waveforms based on deep self-attention network according to an embodiment of the present invention;

fig. 5 is a waveform diagram of a summer load prediction interval based on a deep self-attention network according to an embodiment of the present invention.

Detailed Description

The present application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical solutions of the present invention and are not intended to limit the scope of protection of the present application.

As shown in fig. 1, a load prediction system based on a deep self-attention network, which takes sample data as input and electric load predicted value as output, includes: a self-attention encoder 1, a history score calculating unit 2, a position score calculating unit 3, a position encoder 4, a Query sequence unit 5, a spatial attention unit 6, and a temporal attention unit 7.

Sample data is input into the self-attention encoder 1; the history score calculating unit 2 obtains a history score from the output of the self-attention encoder 1, and the position score calculating unit 3 obtains a position score from the output of the position encoder 4; the sample data, the history score, and the position score are input to a Query sequence unit 5, from which a Query sequence is generated.

The Query sequence is input to the spatial attention unit 6 to obtain a spatial attention sequence, the spatial attention sequence is input to the temporal attention unit 7 together with the output from the attention unit 1, and the temporal attention unit 7 outputs the power load predicted value.

The self-attention unit 1 and the history score calculating unit 2 are connected by adopting a full connection layer FC; the position encoder 4 and the position score calculating unit 3 are connected by adopting a full connection layer FC; the output value of the time attention unit 7 passes through the full connection layer FC to obtain the power load predicted value.

The Query sequence unit 5 generates all the Query sequences 8 required by the spatial attention unit 6 and the temporal attention unit 7 in a single step time, enabling parallel prediction by the system.

As shown in fig. 2, the load prediction method based on the deep self-attention network includes:

and step 1, collecting time-related original data and space-related original data of power load prediction, and constructing an input data set.

And 2, processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm.

Wherein, the value range of p is [0, + ] infinity;

Step 3, constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the time attention unit based on the improved transducer network architecture; generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence.

wherein P is ⁽ⁿ⁾ For position score, H ⁽ⁿ⁾ Is a historical score.

Step 4, obtaining a joint attention sequence by using the spatial attention unit and the temporal attention unit for the input data set in the power load prediction period.

Preferably, step 4 comprises:

step 4.1, for the nth object O during the power load prediction period _n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising power consumption which varies with time in various electric equipment and spatial related data set>Comprising the power consumption of the movable electric equipment along with the change of the position.

Step 4.2 for time dependent data setsCalculating the time attention weight +.>

In the method, in the process of the invention,

and->Computing intermediate variables inherent to deep learning attention mechanisms, allThe decoding result is obtained by linear transformation combination and is based on the same self-attention mechanism>And->The values are the same, and the following relational expression is satisfied:

d _model is the dimension of the input variable.

In the method, in the process of the invention,

d _model is the dimension of the input variable.

And 5, establishing a deep self-attention network model, and training the deep self-attention network based on sample data by taking Adam as an optimizer.

Preferably, step 5 comprises:

in the formula g ^(t，i) To train the corresponding true value of the sample, z ^(t，i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 _s (. Cndot.) is to smooth the L1 penalty function.

Example 1.

The pytorch programming is used to implement the operation of the deep self-attention network based load prediction system. The training process and the change in the loss function are shown in fig. 3 using the winter and summer week load data of a region in north america in 1992 as training and verification data.

In fig. 3, the values of the predicted root mean square error and the loss function gradually decrease with the training process and finally stabilize, and it is seen that the load prediction system based on the deep self-attention network can quickly converge after training. The root mean square error over the data set for the different network structures is detailed in table 1.

Table 1 root mean square error over data set for different prediction methods

As can be seen from table 1, the root mean square error value of the self-attention network proposed by the present invention is smaller than that of other network structures, so that the power load can be predicted more accurately.

The prediction results of the deep self-attention network based load prediction system are shown in fig. 4 and 5. In fig. 4 and 5, the prediction section result output by the load prediction system based on the deep self-attention network includes an actual value, and the power load can be effectively predicted.

The beneficial effects of the invention also include:

While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.

Claims

1. A load prediction system based on a deep self-attention network, the system taking sample data as input and power load predicted value as output, comprising an encoder and a decoder, characterized in that,

the sample data is input into a self-attention encoder, and in a single step time, the sample data is input into the self-attention encoder for the nth object O _h Generating a Query sequence;

the history score calculating unit obtains a history score from the output of the self-attention encoder and calculates a history score H according to the following relation ⁽ⁿ⁾ ：

the position score calculating unit obtains the position score according to the output of the position encoder and calculates the position score P according to the following relation ⁽ⁿ⁾ ：

the sample data, the historical score and the position score are input into a Query sequence unit, and a Query sequence is generated by the Query sequence unit; wherein the history score and the position score are used as a Query sequence Q ⁽ⁿ⁾ Is based on the sample data set as an input sequenceAt the time, query sequence Q ⁽ⁿ⁾ The following relation is satisfied:

wherein P is ⁽ⁿ⁾ For position score, H ⁽ⁿ⁾ Is a historical score;

the Query sequence is input into a spatial attention unit to obtain a spatial attention sequence, the spatial attention sequence and the output of a self-attention unit are input into a time attention unit together, and the time attention unit outputs an electric load predicted value;

the system collects time-related original data and space-related original data of power load prediction, and an input data set is constructed, wherein the time-related original data comprises load historical data, regional illumination data and regional wind speed data, and the space-related original data comprises station position data;

processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm; the input data set isWherein the ith data sample satisfiesProcessing the input data set by adopting a regularization method comprises the following steps: calculating the i-th data sample +.>L of (2) _P Norms->

Wherein, the value range of p is [0t+ ] infinity;

based on L _P Norm numberFor the i-th data sample->Regularization is carried out, and the following relation is satisfied:

in the method, in the process of the invention,for the ith data sample in the D-th class, where d=1t2t … tD, D represents the total number of classes;

constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the temporal attention unit based on the improved transducer network architecture; for the nth object O in a single step time _n Generating a Query sequence, comprising:

the history score H is calculated in the following relation ⁽ⁿ⁾ ：

calculating a position score P in the following relation ⁽ⁿ⁾ ：

integrating the historical score and the location score as a Query sequence Q ⁽ⁿ⁾ Is based on the sample data set as an input sequenceAt the time, query sequence Q ⁽ⁿ⁾ The following relation is satisfied:

wherein P is ⁽ⁿ⁾ For position score, H ⁽ⁿ⁾ Is a historical score;

generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence; for the nth object O during the power load prediction period _n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising the power consumption which changes with time in various electric equipment, wherein the space related data set is +.>The power consumption of the movable electric equipment along with the change of the position is contained; for time-dependent data sets->Calculating the time attention weight +.>

Where softmax (·) is the activation function, mapping the network output into the (0 t 1) interval,

W _tQ characterization of first time network learning parameters for an attention networkAnd->Is used for the degree of similarity of (c) to (c),W _tK for the second time network learning parameters of the attention network, characterize +.>And->Similarity of->d _model Dimension for the input variable;

for spatially correlated data setsCalculating the spatial attention weight +.>

for a feature vector defined based on input features, it is defined as a spatially dependent dataset +.>And (2) linear mapping ofAnd->The same value is taken, and the following relation is satisfied:

W _sQ characterization for first spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),W _sK for the second spatial network learning parameter of the attention network, characterize +.>And->Similarity of->d _model Dimension for the input variable;

using temporal attention weightsAnd spatial attention weight->The joint attention weight is calculated by the following relation>

Order theObtaining a power grid load prediction output value according to the following relation:

the obtained predicted output value is a fixed-length joint attention sequence determined according to the prediction period;

obtaining a joint attention sequence for the input data set using the spatial attention unit and the temporal attention unit during the power load prediction period; establishing a deep self-attention network model, and training the deep self-attention network based on sample data by taking Adam as an optimizer; inputting the test data into a network and outputting a power load predicted value; and smoothing the L1 loss function to satisfy the following relation:

in the formula g ^(t，i) For training samplesCorresponding true value, z ^(t，i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 _s (. Cndot.) is to smooth the L1 loss function;

constructing the objective function L in the following relation _o ：

Using an objective function L _o With Adam as an optimizer, the deep self-attention network is trained based on sample data.

2. The deep self-attention network based load prediction system of claim 1 wherein,

the self-attention unit and the history score calculation unit are connected by adopting a full-connection layer;

the position encoder and the position score calculating unit are connected by adopting a full-connection layer;

and the output value of the time attention unit is subjected to a full connection layer to obtain a power load predicted value.

3. The deep self-attention network based load prediction system of claim 1 wherein,

the Query sequence unit generates all the Query sequences required by the spatial attention unit and the temporal attention unit in a single step time, and the system performs parallel prediction.

4. The deep self-attention network based load prediction method of claim 1, wherein,

the method further includes calculating a root mean square error for the output power load prediction value, and using the root mean square error as an evaluation value of the power load prediction accuracy.