CN113379164B - Load prediction method and system based on deep self-attention network - Google Patents

Load prediction method and system based on deep self-attention network Download PDF

Info

Publication number
CN113379164B
CN113379164B CN202110807996.7A CN202110807996A CN113379164B CN 113379164 B CN113379164 B CN 113379164B CN 202110807996 A CN202110807996 A CN 202110807996A CN 113379164 B CN113379164 B CN 113379164B
Authority
CN
China
Prior art keywords
attention
unit
score
network
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110807996.7A
Other languages
Chinese (zh)
Other versions
CN113379164A (en
Inventor
田江
苏大威
赵家庆
吴海伟
吕洋
赵奇
丁宏恩
俞瑜
赵慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co Ltd
Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co Ltd
Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co Ltd, Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co Ltd
Priority to CN202110807996.7A priority Critical patent/CN113379164B/en
Publication of CN113379164A publication Critical patent/CN113379164A/en
Application granted granted Critical
Publication of CN113379164B publication Critical patent/CN113379164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The power load prediction method and system based on the depth self-attention network takes sample data as input and takes a power load predicted value as output, and comprises a self-attention encoder, a history score calculation unit, a position encoder, a Query sequence unit, a spatial attention unit and a temporal attention unit; and the accurate prediction of the load change trend and the load change size in the power system is realized based on the deep learning and the self-attention model. The system establishes a non-autoregressive self-attention neural network for prediction, overcomes the problems of time lag and accumulated errors in the traditional deep learning model, and simultaneously establishes a attention mechanics learning mechanism among multiple variables, so as to realize the time sequence prediction based on multiple variable aggregation and effectively improve the prediction precision. The method and the system can fully utilize mass data acquired in the running process of the power grid to accurately predict the system load, and provide basis for the subsequent scheduling control of the power grid.

Description

Load prediction method and system based on deep self-attention network
Technical Field
The invention relates to the technical field of load prediction of power systems, in particular to a load prediction method and system based on a deep self-attention network.
Background
With the rapid development of power grid construction, the power system is developed to be intelligent and informationized. Power load prediction is an important one of these, and its consequences will have a great impact on the deployment, planning and operation of the power system. Short-term load prediction has important influence on important decisions such as daily operation, scheduling planning and the like of a power grid. Therefore, in order to ensure economic benefit and social benefit, the ability of accurately predicting the power load is important, so that the safety of a power system can be ensured, and meanwhile, a power supply enterprise can be ensured to economically and efficiently make a power generation plan.
In the prior art, short-term power load prediction mainly carries out power consumption prediction from several hours to one day in the future of a power system, because of randomness and nonlinearity of the power load, the difficulty of short-term load prediction is improved, and meanwhile, the load is subjected to multiple influences of environmental factors such as temperature, illumination, wind speed and the like which change in real time and subjective factors of users, so that the complexity of short-term load prediction is increased, the accuracy of short-term load prediction is reduced, and therefore, accurate and rapid short-term load prediction becomes a challenging task.
At present, a great deal of related studies have been conducted in the field of load prediction, wherein classical prediction methods include time series methods, regression analysis methods, and the like. The method has simple implementation principle and high operation speed, and is suitable for processing the data set with simple structure and small scale. However, as infrastructure is continuously perfected and the degree of informatization of the power grid is improved, the scale of users is continuously enlarged, the power data is rapidly increased, and meanwhile, the classical method is not ideal in large-scale data due to the nonlinear characteristics of part of the power data.
Whereas machine learning methods are of interest because of their strong adaptability and non-linear processing capabilities. The deep neural network can approach any function in theory, and has good effects in the fields of image recognition, natural language processing and the like. The self-attention mechanism can calculate the attention of each data in the sequence data to other data and assign different weights to the data, so that the sequence data processed by the self-attention mechanism becomes the sequence data containing the weight information, and the self-attention mechanism has stronger capability of processing the sequence data. The Chinese patent application (CN 110909919A) discloses a photovoltaic power prediction method of a deep neural network model fused with an attention mechanism, which is characterized in that a deep learning algorithm is utilized to model and predict the photovoltaic electric field power, and the attention mechanism is utilized to carry out weighted summation on the depth characteristics extracted from the neural network model, so that the selection of high-quality characteristic information with heavier weight of a prediction result is realized, the accuracy and the stability of the photovoltaic power prediction model are improved, the interference of useless information on the model is reduced, and the calculation time is shortened. The Chinese patent (CN 110355608B) discloses a cutter abrasion loss prediction method based on a self-attention mechanism and deep learning, which utilizes characteristic information related to cutter abrasion in measured data of a self-attention mechanism and a bidirectional long-short-time memory network combined excavation sensor, extracts the dependency relationship among measured data of three special sensors of cutting force, vibration signals and sound signals at different moments, effectively improves the real-time prediction effect of cutter abrasion loss of a numerical control machine tool, and can be applied to the prediction of cutter abrasion loss of the numerical control machine tool in industrial production. The Chinese patent application (CN 112052977A) discloses a reservoir reserves prediction method based on a deep space-time attention network, and the adverse effect of data fluctuation on a prediction result is relieved by combining a cyclic neural network and an attention mechanism, so that the prediction precision is improved. Chinese patent application (CN 110413844 a) discloses a "dynamic link prediction method based on a spatiotemporal attention depth model", in which attention coefficients at each time are calculated, and the normalized attention coefficients are used as weights in the calculation of prediction results.
Disclosure of Invention
In order to solve the defects existing in the prior art, the invention aims to provide a load prediction method and system based on a deep self-attention network.
The invention adopts the following technical scheme.
A load prediction system based on a depth self-attention network takes sample data as input and takes power load predicted values as output, and comprises an encoder and a decoder.
The encoder is a self-attention encoder, the decoder comprising: a spatial attention unit, a temporal attention unit;
the system further comprises: the system comprises a history score calculating unit, a position encoder and a Query sequence unit;
sample data is input into a self-attention encoder; the history score calculating unit obtains a history score according to the output of the self-attention encoder, and the position score calculating unit obtains a position score according to the output of the position encoder; the sample data, the historical score and the position score are input into a Query sequence unit, and a Query sequence is generated by the Query sequence unit;
the Query sequence is input to the spatial attention unit to obtain a spatial attention sequence, the spatial attention sequence and the output of the self-attention unit are input to the time attention unit together, and the time attention unit outputs the power load predicted value.
Preferably, the self-attention unit and the history score calculation unit are connected by adopting a full connection layer; the position encoder and the position score calculating unit are connected by adopting a full-connection layer; and obtaining the power load predicted value after the output value of the time attention unit passes through the full connection layer.
Preferably, the Query sequence unit generates all the Query sequences required by the spatial attention unit and the temporal attention unit in a single step time, and the system performs parallel prediction.
The load prediction method based on the deep self-attention network comprises the following steps:
step 1, collecting time-related original data and space-related original data of power load prediction, and constructing an input data set;
step 2, processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm;
step 3, constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the time attention unit based on the improved transducer network architecture; generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence;
step 4, obtaining a joint attention sequence by using a spatial attention unit and a time attention unit for an input data set in a power load prediction period;
step 5, establishing a deep self-attention network model, taking Adam as an optimizer, and training the deep self-attention network based on sample data;
and 6, inputting the test data into a network and outputting the power load predicted value.
Preferably, in step 1, the time-related raw data includes load history data, regional illumination data, regional wind speed data, and the space-related raw data includes station position data.
Preferably, in step 2, the input data set isWherein the ith data sample satisfies +.>Processing the input data set by adopting a regularization method comprises the following steps:
step 2.1, calculating the ith data sample according to the following relationL of (2) P Norms->
Wherein, the value range of p is [0, + ] infinity;
step 2.2, based on L P Norm numberFor the i-th data sample->Regularization is carried out, and the following relation is satisfied:
in the method, in the process of the invention,for the ith data sample in class D, where d=1, 2, …, D represents the total number of classes.
Preferably, in step 3, in a single step time, the nth object O n Generating a Query sequence, comprising:
step 3.1, calculating a history score H according to the following relation (n)
In the method, in the process of the invention,for the output of the self-attention encoder, W H As a learnable parameter of historical score, b H Network learning rate for historical scores;
step 3.2, calculating the position score P according to the following relation (n)
In the method, in the process of the invention,w, which is the output of the position encoder p A learnable parameter, b, being a location score p Network learning rate for location score;
step 3.3, integrating the history score and the location score as a Query sequence Q (n) Is based on the sample data set as an input sequenceAt the time, query sequence Q (n) The following relation is satisfied:
wherein P is (n) For position score, H (n) Is a historical score.
Preferably, step 4 comprises:
step 4.1, for the nth object O during the power load prediction period n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising power consumption which varies with time in various electric equipment and spatial related data set>The power consumption of the movable electric equipment along with the change of the position is contained;
step 4.2 for time dependent data setsCalculating the time attention weight +.>
In the method, in the process of the invention,
softmax (·) is the activation function, mapping the network output into (0, 1) intervals,
wherein,for feature vectors defined based on input features, defined as +.>And the output of the position encoder->The linear superposition is obtained, and the following relation is satisfied:
and->For the inherent calculation intermediate variable of deep learning attention mechanism, all decoding results are obtained by linear transformation combination, and the same self-attention mechanism is based on +.>And->The values are the same, and the following relational expression is satisfied:
W tQ characterization of first time network learning parameters for an attention networkAnd->Is used for the degree of similarity of (c) to (c),
W tK characterization for second time network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
d model dimension for the input variable;
step 4.3 for spatially correlated data setsCalculating the spatial attention weight +.>
In the method, in the process of the invention,
softmax (·) is the activation function, mapping the network output into (0, 1) intervals,
for a feature vector defined based on input features, it is defined as a spatially dependent dataset +.>And>and->The same value is taken, and the following relation is satisfied:
W sQ characterization for first spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
W sK characterization for second spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
d model dimension for the input variable;
step 4.4, utilizing the time attention weightAnd spatial attention weight->The joint attention weight is calculated by the following relation>
Step 4.5, orderObtaining a power grid load prediction output value according to the following relation:
the resulting predicted output value is a fixed length joint attention sequence determined from the prediction period.
Preferably, step 5 comprises:
and 5.1, smoothing the L1 loss function, wherein the following relation is satisfied:
in the formula g (t,i) To train the corresponding true value of the sample, z (t,i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 s (. Cndot.) is to smooth the L1 loss function;
step 5.2, constructing the objective function L by the following relation o
Step 5.3, using the objective function L o With Adam as an optimizer, the deep self-attention network is trained based on sample data.
Preferably, step 6 further includes calculating a root mean square error for the output power load prediction value, and using the root mean square error as an evaluation value of the power load prediction accuracy.
Compared with the prior art, the method has the advantages that the time lag and accumulated error problems are overcome, meanwhile, the system constructs a mechanism of attention and mechanics among multiple variables, the time sequence prediction based on multiple variables is realized, and the prediction precision is effectively improved.
The beneficial effects of the invention also include:
1. the improved transducer network is used in a load prediction system based on a deep self-attention network to realize the prediction of a multidimensional time sequence by introducing a Query generation unit; only one network model is used, so that the structure of the prediction system is simplified, and the calculation speed is improved;
2. the Query generation unit is inserted between the self-attention unit and the time-attention unit and between the time-attention unit and the space-attention unit, all the required Query sequences can be generated in a single step time, so that the system can realize parallel prediction and has higher prediction efficiency;
3. meanwhile, the time correlation attention and the space correlation attention are calculated, so that load data used for load prediction not only have time characteristics but also have geographic characteristics, and a prediction result is more accurate and reliable;
4. the attention sequence with the fixed length is calculated according to the power load prediction demand, so that the traditional method for calculating the attention data at a certain moment is broken through, and the application range of the prediction result is wider.
Drawings
FIG. 1 is a schematic diagram of a load prediction system based on a deep self-attention network according to the present invention;
wherein reference numerals are as follows:
1-self-attention encoder
2-a history score calculating unit;
a 3-position score calculation unit;
4-position encoder;
5-Query sequence unit
6-a spatial attention unit;
7-a time attention unit;
8-Query sequence;
FC-full connection layer units;
FIG. 2 is a flow chart of a deep self-attention network based load prediction method of the present invention;
FIG. 3 is a diagram of a load prediction process based on a deep self-attention network in accordance with an embodiment of the present invention;
FIG. 4 is a graph of winter load prediction interval waveforms based on deep self-attention network according to an embodiment of the present invention;
fig. 5 is a waveform diagram of a summer load prediction interval based on a deep self-attention network according to an embodiment of the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical solutions of the present invention and are not intended to limit the scope of protection of the present application.
As shown in fig. 1, a load prediction system based on a deep self-attention network, which takes sample data as input and electric load predicted value as output, includes: a self-attention encoder 1, a history score calculating unit 2, a position score calculating unit 3, a position encoder 4, a Query sequence unit 5, a spatial attention unit 6, and a temporal attention unit 7.
Sample data is input into the self-attention encoder 1; the history score calculating unit 2 obtains a history score from the output of the self-attention encoder 1, and the position score calculating unit 3 obtains a position score from the output of the position encoder 4; the sample data, the history score, and the position score are input to a Query sequence unit 5, from which a Query sequence is generated.
The Query sequence is input to the spatial attention unit 6 to obtain a spatial attention sequence, the spatial attention sequence is input to the temporal attention unit 7 together with the output from the attention unit 1, and the temporal attention unit 7 outputs the power load predicted value.
The self-attention unit 1 and the history score calculating unit 2 are connected by adopting a full connection layer FC; the position encoder 4 and the position score calculating unit 3 are connected by adopting a full connection layer FC; the output value of the time attention unit 7 passes through the full connection layer FC to obtain the power load predicted value.
The Query sequence unit 5 generates all the Query sequences 8 required by the spatial attention unit 6 and the temporal attention unit 7 in a single step time, enabling parallel prediction by the system.
As shown in fig. 2, the load prediction method based on the deep self-attention network includes:
and step 1, collecting time-related original data and space-related original data of power load prediction, and constructing an input data set.
Preferably, in step 1, the time-related raw data includes load history data, regional illumination data, regional wind speed data, and the space-related raw data includes station position data.
And 2, processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm.
Preferably, in step 2, the input data set isWherein the ith data sample satisfies +.>Processing the input data set by adopting a regularization method comprises the following steps:
step 2.1, calculating the ith data sample according to the following relationL of (2) P Norms->
Wherein, the value range of p is [0, + ] infinity;
step 2.2, based on L P Norm numberFor the i-th data sample->Regularization is carried out, and the following relation is satisfied:
in the method, in the process of the invention,for the ith data sample in class D, where d=1, 2, …, D represents the total number of classes.
Step 3, constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the time attention unit based on the improved transducer network architecture; generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence.
Preferably, in step 3, in a single step time, the nth object O n Generating a Query sequence, comprising:
step 3.1, calculating a history score H according to the following relation (n)
In the method, in the process of the invention,for the output of the self-attention encoder, W H As a learnable parameter of historical score, b H Network learning rate for historical scores;
step 3.2, calculating the position score P according to the following relation (n)
In the method, in the process of the invention,w, which is the output of the position encoder p A learnable parameter, b, being a location score p Network learning rate for location score;
step 3.3, integrating the history score and the location score as a Query sequence Q (n) Is based on the sample data set as an input sequenceAt the time, query sequence Q (n) The following relation is satisfied:
wherein P is (n) For position score, H (n) Is a historical score.
Step 4, obtaining a joint attention sequence by using the spatial attention unit and the temporal attention unit for the input data set in the power load prediction period.
Preferably, step 4 comprises:
step 4.1, for the nth object O during the power load prediction period n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising power consumption which varies with time in various electric equipment and spatial related data set>Comprising the power consumption of the movable electric equipment along with the change of the position.
Step 4.2 for time dependent data setsCalculating the time attention weight +.>
In the method, in the process of the invention,
softmax (·) is the activation function, mapping the network output into (0, 1) intervals,
wherein,for feature vectors defined based on input features, defined as +.>And the output of the position encoder->The linear superposition is obtained, and the following relation is satisfied:
and->Computing intermediate variables inherent to deep learning attention mechanisms, allThe decoding result is obtained by linear transformation combination and is based on the same self-attention mechanism>And->The values are the same, and the following relational expression is satisfied:
W tQ characterization of first time network learning parameters for an attention networkAnd->Is used for the degree of similarity of (c) to (c),
W tK characterization for second time network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
d model is the dimension of the input variable.
Step 4.3 for spatially correlated data setsCalculating the spatial attention weight +.>
In the method, in the process of the invention,
softmax (·) is the activation function, mapping the network output into (0, 1) intervals,
for a feature vector defined based on input features, it is defined as a spatially dependent dataset +.>And>and->The same value is taken, and the following relation is satisfied:
W sQ characterization for first spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
W sK characterization for second spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
d model is the dimension of the input variable.
Step 4.4, utilizing the time attention weightAnd spatial attention weight->The joint attention weight is calculated by the following relation>
Step 4.5, orderObtaining a power grid load prediction output value according to the following relation:
the resulting predicted output value is a fixed length joint attention sequence determined from the prediction period.
And 5, establishing a deep self-attention network model, and training the deep self-attention network based on sample data by taking Adam as an optimizer.
Preferably, step 5 comprises:
and 5.1, smoothing the L1 loss function, wherein the following relation is satisfied:
in the formula g (t,i) To train the corresponding true value of the sample, z (t,i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 s (. Cndot.) is to smooth the L1 penalty function.
Step 5.2, constructing the objective function L by the following relation o
Step 5.3, using the objective function L o With Adam as an optimizer, the deep self-attention network is trained based on sample data.
And 6, inputting the test data into a network and outputting the power load predicted value.
Preferably, step 6 further includes calculating a root mean square error for the output power load prediction value, and using the root mean square error as an evaluation value of the power load prediction accuracy.
Example 1.
The pytorch programming is used to implement the operation of the deep self-attention network based load prediction system. The training process and the change in the loss function are shown in fig. 3 using the winter and summer week load data of a region in north america in 1992 as training and verification data.
In fig. 3, the values of the predicted root mean square error and the loss function gradually decrease with the training process and finally stabilize, and it is seen that the load prediction system based on the deep self-attention network can quickly converge after training. The root mean square error over the data set for the different network structures is detailed in table 1.
Table 1 root mean square error over data set for different prediction methods
As can be seen from table 1, the root mean square error value of the self-attention network proposed by the present invention is smaller than that of other network structures, so that the power load can be predicted more accurately.
The prediction results of the deep self-attention network based load prediction system are shown in fig. 4 and 5. In fig. 4 and 5, the prediction section result output by the load prediction system based on the deep self-attention network includes an actual value, and the power load can be effectively predicted.
Compared with the prior art, the method has the advantages that the time lag and accumulated error problems are overcome, meanwhile, the system constructs a mechanism of attention and mechanics among multiple variables, the time sequence prediction based on multiple variables is realized, and the prediction precision is effectively improved.
The beneficial effects of the invention also include:
1. the improved transducer network is used in a load prediction system based on a deep self-attention network to realize the prediction of a multidimensional time sequence by introducing a Query generation unit; only one network model is used, so that the structure of the prediction system is simplified, and the calculation speed is improved;
2. the Query generation unit is inserted between the self-attention unit and the time-attention unit and between the time-attention unit and the space-attention unit, all the required Query sequences can be generated in a single step time, so that the system can realize parallel prediction and has higher prediction efficiency;
3. meanwhile, the time correlation attention and the space correlation attention are calculated, so that load data used for load prediction not only have time characteristics but also have geographic characteristics, and a prediction result is more accurate and reliable;
4. the attention sequence with the fixed length is calculated according to the power load prediction demand, so that the traditional method for calculating the attention data at a certain moment is broken through, and the application range of the prediction result is wider.
While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.

Claims (4)

1. A load prediction system based on a deep self-attention network, the system taking sample data as input and power load predicted value as output, comprising an encoder and a decoder, characterized in that,
the encoder is a self-attention encoder, the decoder comprising: a spatial attention unit, a temporal attention unit;
the system further comprises: the system comprises a history score calculating unit, a position encoder and a Query sequence unit;
the sample data is input into a self-attention encoder, and in a single step time, the sample data is input into the self-attention encoder for the nth object O h Generating a Query sequence;
the history score calculating unit obtains a history score from the output of the self-attention encoder and calculates a history score H according to the following relation (n)
In the method, in the process of the invention,for the output of the self-attention encoder, W H As a learnable parameter of historical score, b H Network learning rate for historical scores;
the position score calculating unit obtains the position score according to the output of the position encoder and calculates the position score P according to the following relation (n)
In the method, in the process of the invention,w, which is the output of the position encoder p A learnable parameter, b, being a location score p Network learning rate for location score;
the sample data, the historical score and the position score are input into a Query sequence unit, and a Query sequence is generated by the Query sequence unit; wherein the history score and the position score are used as a Query sequence Q (n) Is based on the sample data set as an input sequenceAt the time, query sequence Q (n) The following relation is satisfied:
wherein P is (n) For position score, H (n) Is a historical score;
the Query sequence is input into a spatial attention unit to obtain a spatial attention sequence, the spatial attention sequence and the output of a self-attention unit are input into a time attention unit together, and the time attention unit outputs an electric load predicted value;
the system collects time-related original data and space-related original data of power load prediction, and an input data set is constructed, wherein the time-related original data comprises load historical data, regional illumination data and regional wind speed data, and the space-related original data comprises station position data;
processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm; the input data set isWherein the ith data sample satisfiesProcessing the input data set by adopting a regularization method comprises the following steps: calculating the i-th data sample +.>L of (2) P Norms->
Wherein, the value range of p is [0t+ ] infinity;
based on L P Norm numberFor the i-th data sample->Regularization is carried out, and the following relation is satisfied:
in the method, in the process of the invention,for the ith data sample in the D-th class, where d=1t2t … tD, D represents the total number of classes;
constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the temporal attention unit based on the improved transducer network architecture; for the nth object O in a single step time n Generating a Query sequence, comprising:
the history score H is calculated in the following relation (n)
In the method, in the process of the invention,for the output of the self-attention encoder, W H As a learnable parameter of historical score, b H Network learning rate for historical scores;
calculating a position score P in the following relation (n)
In the method, in the process of the invention,w, which is the output of the position encoder p A learnable parameter, b, being a location score p Network learning rate for location score;
integrating the historical score and the location score as a Query sequence Q (n) Is based on the sample data set as an input sequenceAt the time, query sequence Q (n) The following relation is satisfied:
wherein P is (n) For position score, H (n) Is a historical score;
generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence; for the nth object O during the power load prediction period n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising the power consumption which changes with time in various electric equipment, wherein the space related data set is +.>The power consumption of the movable electric equipment along with the change of the position is contained; for time-dependent data sets->Calculating the time attention weight +.>
Where softmax (·) is the activation function, mapping the network output into the (0 t 1) interval,
wherein,for feature vectors defined based on input features, defined as +.>And the output of the position encoder->The linear superposition is obtained, and the following relation is satisfied:
and->For the inherent calculation intermediate variable of deep learning attention mechanism, all decoding results are obtained by linear transformation combination, and the same self-attention mechanism is based on +.>And->The values are the same, and the following relational expression is satisfied:
W tQ characterization of first time network learning parameters for an attention networkAnd->Is used for the degree of similarity of (c) to (c),W tK for the second time network learning parameters of the attention network, characterize +.>And->Similarity of->d model Dimension for the input variable;
for spatially correlated data setsCalculating the spatial attention weight +.>
Where softmax (·) is the activation function, mapping the network output into the (0 t 1) interval,
for a feature vector defined based on input features, it is defined as a spatially dependent dataset +.>And (2) linear mapping ofAnd->The same value is taken, and the following relation is satisfied:
W sQ characterization for first spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),W sK for the second spatial network learning parameter of the attention network, characterize +.>And->Similarity of->d model Dimension for the input variable;
using temporal attention weightsAnd spatial attention weight->The joint attention weight is calculated by the following relation>
Order theObtaining a power grid load prediction output value according to the following relation:
the obtained predicted output value is a fixed-length joint attention sequence determined according to the prediction period;
obtaining a joint attention sequence for the input data set using the spatial attention unit and the temporal attention unit during the power load prediction period; establishing a deep self-attention network model, and training the deep self-attention network based on sample data by taking Adam as an optimizer; inputting the test data into a network and outputting a power load predicted value; and smoothing the L1 loss function to satisfy the following relation:
in the formula g (t,i) For training samplesCorresponding true value, z (t,i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 s (. Cndot.) is to smooth the L1 loss function;
constructing the objective function L in the following relation o
Using an objective function L o With Adam as an optimizer, the deep self-attention network is trained based on sample data.
2. The deep self-attention network based load prediction system of claim 1 wherein,
the self-attention unit and the history score calculation unit are connected by adopting a full-connection layer;
the position encoder and the position score calculating unit are connected by adopting a full-connection layer;
and the output value of the time attention unit is subjected to a full connection layer to obtain a power load predicted value.
3. The deep self-attention network based load prediction system of claim 1 wherein,
the Query sequence unit generates all the Query sequences required by the spatial attention unit and the temporal attention unit in a single step time, and the system performs parallel prediction.
4. The deep self-attention network based load prediction method of claim 1, wherein,
the method further includes calculating a root mean square error for the output power load prediction value, and using the root mean square error as an evaluation value of the power load prediction accuracy.
CN202110807996.7A 2021-07-16 2021-07-16 Load prediction method and system based on deep self-attention network Active CN113379164B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110807996.7A CN113379164B (en) 2021-07-16 2021-07-16 Load prediction method and system based on deep self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110807996.7A CN113379164B (en) 2021-07-16 2021-07-16 Load prediction method and system based on deep self-attention network

Publications (2)

Publication Number Publication Date
CN113379164A CN113379164A (en) 2021-09-10
CN113379164B true CN113379164B (en) 2024-03-26

Family

ID=77582233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110807996.7A Active CN113379164B (en) 2021-07-16 2021-07-16 Load prediction method and system based on deep self-attention network

Country Status (1)

Country Link
CN (1) CN113379164B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988449B (en) * 2021-11-05 2024-04-12 国家电网有限公司西北分部 Wind power prediction method based on transducer model
CN115081586B (en) * 2022-05-19 2023-03-31 中国科学院计算机网络信息中心 Photovoltaic power generation time sequence prediction method and system based on time and space attention
CN116831581A (en) * 2023-06-15 2023-10-03 中南大学 Remote physiological sign extraction-based driver state monitoring method and system
CN117175588B (en) * 2023-11-03 2024-01-16 邯郸欣和电力建设有限公司 Space-time correlation-based electricity load prediction method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009089594A (en) * 2007-09-28 2009-04-23 Kankoku Denryoku Kosha Temporal-spatial load analysis system of power facility utilizing inspection data and calculation method of load
CN104598986A (en) * 2014-12-12 2015-05-06 国家电网公司 Big data based power load prediction method
CN110619430A (en) * 2019-09-03 2019-12-27 大连理工大学 Space-time attention mechanism method for traffic prediction
CN110633867A (en) * 2019-09-23 2019-12-31 国家电网有限公司 Ultra-short-term load prediction model based on GRU and attention mechanism
CN110889545A (en) * 2019-11-20 2020-03-17 国网重庆市电力公司电力科学研究院 Power load prediction method and device and readable storage medium
CN111080032A (en) * 2019-12-30 2020-04-28 成都数之联科技有限公司 Load prediction method based on Transformer structure
CN111507521A (en) * 2020-04-15 2020-08-07 北京智芯微电子科技有限公司 Method and device for predicting power load of transformer area
CN111651504A (en) * 2020-06-03 2020-09-11 湖南大学 Multi-element time sequence multilayer space-time dependence modeling method based on deep learning
CN111931989A (en) * 2020-07-10 2020-11-13 国网浙江省电力有限公司绍兴供电公司 Power system short-term load prediction method based on deep learning neural network
CN112052977A (en) * 2019-12-24 2020-12-08 中国石油大学(华东) Oil reservoir reserve prediction method based on deep space-time attention network
CN112163689A (en) * 2020-08-18 2021-01-01 国网浙江省电力有限公司绍兴供电公司 Short-term load quantile probability prediction method based on depth Attention-LSTM
CN112330215A (en) * 2020-11-26 2021-02-05 长沙理工大学 Urban vehicle demand prediction method, equipment and storage medium
CN112653142A (en) * 2020-12-18 2021-04-13 武汉大学 Wind power prediction method and system for optimizing depth transform network
CN112949930A (en) * 2021-03-17 2021-06-11 中国科学院合肥物质科学研究院 PA-LSTM network-based road motor vehicle exhaust high-emission early warning method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4156032A1 (en) * 2017-05-23 2023-03-29 Google LLC Attention-based sequence transduction neural networks
US10700523B2 (en) * 2017-08-28 2020-06-30 General Electric Company System and method for distribution load forecasting in a power grid
US10940863B2 (en) * 2018-11-01 2021-03-09 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009089594A (en) * 2007-09-28 2009-04-23 Kankoku Denryoku Kosha Temporal-spatial load analysis system of power facility utilizing inspection data and calculation method of load
CN104598986A (en) * 2014-12-12 2015-05-06 国家电网公司 Big data based power load prediction method
CN110619430A (en) * 2019-09-03 2019-12-27 大连理工大学 Space-time attention mechanism method for traffic prediction
CN110633867A (en) * 2019-09-23 2019-12-31 国家电网有限公司 Ultra-short-term load prediction model based on GRU and attention mechanism
CN110889545A (en) * 2019-11-20 2020-03-17 国网重庆市电力公司电力科学研究院 Power load prediction method and device and readable storage medium
CN112052977A (en) * 2019-12-24 2020-12-08 中国石油大学(华东) Oil reservoir reserve prediction method based on deep space-time attention network
CN111080032A (en) * 2019-12-30 2020-04-28 成都数之联科技有限公司 Load prediction method based on Transformer structure
CN111507521A (en) * 2020-04-15 2020-08-07 北京智芯微电子科技有限公司 Method and device for predicting power load of transformer area
CN111651504A (en) * 2020-06-03 2020-09-11 湖南大学 Multi-element time sequence multilayer space-time dependence modeling method based on deep learning
CN111931989A (en) * 2020-07-10 2020-11-13 国网浙江省电力有限公司绍兴供电公司 Power system short-term load prediction method based on deep learning neural network
CN112163689A (en) * 2020-08-18 2021-01-01 国网浙江省电力有限公司绍兴供电公司 Short-term load quantile probability prediction method based on depth Attention-LSTM
CN112330215A (en) * 2020-11-26 2021-02-05 长沙理工大学 Urban vehicle demand prediction method, equipment and storage medium
CN112653142A (en) * 2020-12-18 2021-04-13 武汉大学 Wind power prediction method and system for optimizing depth transform network
CN112949930A (en) * 2021-03-17 2021-06-11 中国科学院合肥物质科学研究院 PA-LSTM network-based road motor vehicle exhaust high-emission early warning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习算法的主动配电网负荷预测研究;马锋 等;《 计算机工程与应用 》;71-75, 114 *

Also Published As

Publication number Publication date
CN113379164A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN113379164B (en) Load prediction method and system based on deep self-attention network
De Giorgi et al. Assessment of the benefits of numerical weather predictions in wind power forecasting based on statistical methods
CN104951836A (en) Posting predication system based on nerual network technique
CN115622047B (en) Power Transformer load prediction method based on Transformer model
CN110443417A (en) Multiple-model integration load forecasting method based on wavelet transformation
CN106548270A (en) A kind of photovoltaic plant power anomalous data identification method and device
Li et al. Deep spatio-temporal wind power forecasting
CN105303268A (en) Wind power generation output power prediction method based on similarity theory
CN109087215A (en) More Power Output for Wind Power Field joint probability density prediction techniques
CN103279030B (en) Dynamic soft measuring modeling method and device based on Bayesian frame
CN115169742A (en) Short-term wind power generation power prediction method
Siddarameshwara et al. Electricity short term load forecasting using elman recurrent neural network
Johannesen et al. Comparing recurrent neural networks using principal component analysis for electrical load predictions
CN108830405B (en) Real-time power load prediction system and method based on multi-index dynamic matching
CN106682312A (en) Industrial process soft-measurement modeling method of local weighing extreme learning machine model
CN110222910A (en) A kind of active power distribution network Tendency Prediction method and forecasting system
CN109270917B (en) Intelligent power plant steam turbine bearing-oriented closed-loop control system fault degradation state prediction method
CN116703644A (en) Attention-RNN-based short-term power load prediction method
Chen et al. Air quality prediction based on Kohonen Clustering and ReliefF feature selection
CN106709570A (en) Time dimension expansion and local weighting extreme learning machine-based soft measurement modeling method
CN106773697A (en) A kind of time dimension expands the industrial process soft-measuring modeling method of extreme learning machine model
CN113705887A (en) Data-driven photovoltaic power generation power prediction method and system
Li et al. Mutual information variational autoencoders and its application to feature extraction of multivariate time series
Alomoush et al. Residential Power Load Prediction in Smart Cities using Machine Learning Approaches
Guan et al. Multiple wind power time series modeling method considering correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant