CN113379164B - Load prediction method and system based on deep self-attention network - Google Patents
Load prediction method and system based on deep self-attention network Download PDFInfo
- Publication number
- CN113379164B CN113379164B CN202110807996.7A CN202110807996A CN113379164B CN 113379164 B CN113379164 B CN 113379164B CN 202110807996 A CN202110807996 A CN 202110807996A CN 113379164 B CN113379164 B CN 113379164B
- Authority
- CN
- China
- Prior art keywords
- attention
- unit
- score
- network
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000008569 process Effects 0.000 claims abstract description 19
- 230000007246 mechanism Effects 0.000 claims abstract description 17
- 230000002123 temporal effect Effects 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 230000008859 change Effects 0.000 claims abstract description 7
- 238000013135 deep learning Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 11
- 238000012512 characterization method Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 9
- 230000036962 time dependent Effects 0.000 claims description 9
- 230000000875 corresponding effect Effects 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 238000005286 illumination Methods 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 abstract description 3
- 230000002776 aggregation Effects 0.000 abstract 1
- 238000004220 aggregation Methods 0.000 abstract 1
- 238000013136 deep learning model Methods 0.000 abstract 1
- 238000005299 abrasion Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Software Systems (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Computation (AREA)
- Operations Research (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The power load prediction method and system based on the depth self-attention network takes sample data as input and takes a power load predicted value as output, and comprises a self-attention encoder, a history score calculation unit, a position encoder, a Query sequence unit, a spatial attention unit and a temporal attention unit; and the accurate prediction of the load change trend and the load change size in the power system is realized based on the deep learning and the self-attention model. The system establishes a non-autoregressive self-attention neural network for prediction, overcomes the problems of time lag and accumulated errors in the traditional deep learning model, and simultaneously establishes a attention mechanics learning mechanism among multiple variables, so as to realize the time sequence prediction based on multiple variable aggregation and effectively improve the prediction precision. The method and the system can fully utilize mass data acquired in the running process of the power grid to accurately predict the system load, and provide basis for the subsequent scheduling control of the power grid.
Description
Technical Field
The invention relates to the technical field of load prediction of power systems, in particular to a load prediction method and system based on a deep self-attention network.
Background
With the rapid development of power grid construction, the power system is developed to be intelligent and informationized. Power load prediction is an important one of these, and its consequences will have a great impact on the deployment, planning and operation of the power system. Short-term load prediction has important influence on important decisions such as daily operation, scheduling planning and the like of a power grid. Therefore, in order to ensure economic benefit and social benefit, the ability of accurately predicting the power load is important, so that the safety of a power system can be ensured, and meanwhile, a power supply enterprise can be ensured to economically and efficiently make a power generation plan.
In the prior art, short-term power load prediction mainly carries out power consumption prediction from several hours to one day in the future of a power system, because of randomness and nonlinearity of the power load, the difficulty of short-term load prediction is improved, and meanwhile, the load is subjected to multiple influences of environmental factors such as temperature, illumination, wind speed and the like which change in real time and subjective factors of users, so that the complexity of short-term load prediction is increased, the accuracy of short-term load prediction is reduced, and therefore, accurate and rapid short-term load prediction becomes a challenging task.
At present, a great deal of related studies have been conducted in the field of load prediction, wherein classical prediction methods include time series methods, regression analysis methods, and the like. The method has simple implementation principle and high operation speed, and is suitable for processing the data set with simple structure and small scale. However, as infrastructure is continuously perfected and the degree of informatization of the power grid is improved, the scale of users is continuously enlarged, the power data is rapidly increased, and meanwhile, the classical method is not ideal in large-scale data due to the nonlinear characteristics of part of the power data.
Whereas machine learning methods are of interest because of their strong adaptability and non-linear processing capabilities. The deep neural network can approach any function in theory, and has good effects in the fields of image recognition, natural language processing and the like. The self-attention mechanism can calculate the attention of each data in the sequence data to other data and assign different weights to the data, so that the sequence data processed by the self-attention mechanism becomes the sequence data containing the weight information, and the self-attention mechanism has stronger capability of processing the sequence data. The Chinese patent application (CN 110909919A) discloses a photovoltaic power prediction method of a deep neural network model fused with an attention mechanism, which is characterized in that a deep learning algorithm is utilized to model and predict the photovoltaic electric field power, and the attention mechanism is utilized to carry out weighted summation on the depth characteristics extracted from the neural network model, so that the selection of high-quality characteristic information with heavier weight of a prediction result is realized, the accuracy and the stability of the photovoltaic power prediction model are improved, the interference of useless information on the model is reduced, and the calculation time is shortened. The Chinese patent (CN 110355608B) discloses a cutter abrasion loss prediction method based on a self-attention mechanism and deep learning, which utilizes characteristic information related to cutter abrasion in measured data of a self-attention mechanism and a bidirectional long-short-time memory network combined excavation sensor, extracts the dependency relationship among measured data of three special sensors of cutting force, vibration signals and sound signals at different moments, effectively improves the real-time prediction effect of cutter abrasion loss of a numerical control machine tool, and can be applied to the prediction of cutter abrasion loss of the numerical control machine tool in industrial production. The Chinese patent application (CN 112052977A) discloses a reservoir reserves prediction method based on a deep space-time attention network, and the adverse effect of data fluctuation on a prediction result is relieved by combining a cyclic neural network and an attention mechanism, so that the prediction precision is improved. Chinese patent application (CN 110413844 a) discloses a "dynamic link prediction method based on a spatiotemporal attention depth model", in which attention coefficients at each time are calculated, and the normalized attention coefficients are used as weights in the calculation of prediction results.
Disclosure of Invention
In order to solve the defects existing in the prior art, the invention aims to provide a load prediction method and system based on a deep self-attention network.
The invention adopts the following technical scheme.
A load prediction system based on a depth self-attention network takes sample data as input and takes power load predicted values as output, and comprises an encoder and a decoder.
The encoder is a self-attention encoder, the decoder comprising: a spatial attention unit, a temporal attention unit;
the system further comprises: the system comprises a history score calculating unit, a position encoder and a Query sequence unit;
sample data is input into a self-attention encoder; the history score calculating unit obtains a history score according to the output of the self-attention encoder, and the position score calculating unit obtains a position score according to the output of the position encoder; the sample data, the historical score and the position score are input into a Query sequence unit, and a Query sequence is generated by the Query sequence unit;
the Query sequence is input to the spatial attention unit to obtain a spatial attention sequence, the spatial attention sequence and the output of the self-attention unit are input to the time attention unit together, and the time attention unit outputs the power load predicted value.
Preferably, the self-attention unit and the history score calculation unit are connected by adopting a full connection layer; the position encoder and the position score calculating unit are connected by adopting a full-connection layer; and obtaining the power load predicted value after the output value of the time attention unit passes through the full connection layer.
Preferably, the Query sequence unit generates all the Query sequences required by the spatial attention unit and the temporal attention unit in a single step time, and the system performs parallel prediction.
The load prediction method based on the deep self-attention network comprises the following steps:
step 1, collecting time-related original data and space-related original data of power load prediction, and constructing an input data set;
step 2, processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm;
step 3, constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the time attention unit based on the improved transducer network architecture; generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence;
step 4, obtaining a joint attention sequence by using a spatial attention unit and a time attention unit for an input data set in a power load prediction period;
step 5, establishing a deep self-attention network model, taking Adam as an optimizer, and training the deep self-attention network based on sample data;
and 6, inputting the test data into a network and outputting the power load predicted value.
Preferably, in step 1, the time-related raw data includes load history data, regional illumination data, regional wind speed data, and the space-related raw data includes station position data.
Preferably, in step 2, the input data set isWherein the ith data sample satisfies +.>Processing the input data set by adopting a regularization method comprises the following steps:
step 2.1, calculating the ith data sample according to the following relationL of (2) P Norms->
Wherein, the value range of p is [0, + ] infinity;
step 2.2, based on L P Norm numberFor the i-th data sample->Regularization is carried out, and the following relation is satisfied:
in the method, in the process of the invention,for the ith data sample in class D, where d=1, 2, …, D represents the total number of classes.
Preferably, in step 3, in a single step time, the nth object O n Generating a Query sequence, comprising:
step 3.1, calculating a history score H according to the following relation (n) :
In the method, in the process of the invention,for the output of the self-attention encoder, W H As a learnable parameter of historical score, b H Network learning rate for historical scores;
step 3.2, calculating the position score P according to the following relation (n) :
In the method, in the process of the invention,w, which is the output of the position encoder p A learnable parameter, b, being a location score p Network learning rate for location score;
step 3.3, integrating the history score and the location score as a Query sequence Q (n) Is based on the sample data set as an input sequenceAt the time, query sequence Q (n) The following relation is satisfied:
wherein P is (n) For position score, H (n) Is a historical score.
Preferably, step 4 comprises:
step 4.1, for the nth object O during the power load prediction period n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising power consumption which varies with time in various electric equipment and spatial related data set>The power consumption of the movable electric equipment along with the change of the position is contained;
step 4.2 for time dependent data setsCalculating the time attention weight +.>
In the method, in the process of the invention,
softmax (·) is the activation function, mapping the network output into (0, 1) intervals,
wherein,for feature vectors defined based on input features, defined as +.>And the output of the position encoder->The linear superposition is obtained, and the following relation is satisfied:
and->For the inherent calculation intermediate variable of deep learning attention mechanism, all decoding results are obtained by linear transformation combination, and the same self-attention mechanism is based on +.>And->The values are the same, and the following relational expression is satisfied:
W tQ characterization of first time network learning parameters for an attention networkAnd->Is used for the degree of similarity of (c) to (c),
W tK characterization for second time network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
d model dimension for the input variable;
step 4.3 for spatially correlated data setsCalculating the spatial attention weight +.>
In the method, in the process of the invention,
softmax (·) is the activation function, mapping the network output into (0, 1) intervals,
for a feature vector defined based on input features, it is defined as a spatially dependent dataset +.>And>and->The same value is taken, and the following relation is satisfied:
W sQ characterization for first spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
W sK characterization for second spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
d model dimension for the input variable;
step 4.4, utilizing the time attention weightAnd spatial attention weight->The joint attention weight is calculated by the following relation>
Step 4.5, orderObtaining a power grid load prediction output value according to the following relation:
the resulting predicted output value is a fixed length joint attention sequence determined from the prediction period.
Preferably, step 5 comprises:
and 5.1, smoothing the L1 loss function, wherein the following relation is satisfied:
in the formula g (t,i) To train the corresponding true value of the sample, z (t,i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 s (. Cndot.) is to smooth the L1 loss function;
step 5.2, constructing the objective function L by the following relation o :
Step 5.3, using the objective function L o With Adam as an optimizer, the deep self-attention network is trained based on sample data.
Preferably, step 6 further includes calculating a root mean square error for the output power load prediction value, and using the root mean square error as an evaluation value of the power load prediction accuracy.
Compared with the prior art, the method has the advantages that the time lag and accumulated error problems are overcome, meanwhile, the system constructs a mechanism of attention and mechanics among multiple variables, the time sequence prediction based on multiple variables is realized, and the prediction precision is effectively improved.
The beneficial effects of the invention also include:
1. the improved transducer network is used in a load prediction system based on a deep self-attention network to realize the prediction of a multidimensional time sequence by introducing a Query generation unit; only one network model is used, so that the structure of the prediction system is simplified, and the calculation speed is improved;
2. the Query generation unit is inserted between the self-attention unit and the time-attention unit and between the time-attention unit and the space-attention unit, all the required Query sequences can be generated in a single step time, so that the system can realize parallel prediction and has higher prediction efficiency;
3. meanwhile, the time correlation attention and the space correlation attention are calculated, so that load data used for load prediction not only have time characteristics but also have geographic characteristics, and a prediction result is more accurate and reliable;
4. the attention sequence with the fixed length is calculated according to the power load prediction demand, so that the traditional method for calculating the attention data at a certain moment is broken through, and the application range of the prediction result is wider.
Drawings
FIG. 1 is a schematic diagram of a load prediction system based on a deep self-attention network according to the present invention;
wherein reference numerals are as follows:
1-self-attention encoder
2-a history score calculating unit;
a 3-position score calculation unit;
4-position encoder;
5-Query sequence unit
6-a spatial attention unit;
7-a time attention unit;
8-Query sequence;
FC-full connection layer units;
FIG. 2 is a flow chart of a deep self-attention network based load prediction method of the present invention;
FIG. 3 is a diagram of a load prediction process based on a deep self-attention network in accordance with an embodiment of the present invention;
FIG. 4 is a graph of winter load prediction interval waveforms based on deep self-attention network according to an embodiment of the present invention;
fig. 5 is a waveform diagram of a summer load prediction interval based on a deep self-attention network according to an embodiment of the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical solutions of the present invention and are not intended to limit the scope of protection of the present application.
As shown in fig. 1, a load prediction system based on a deep self-attention network, which takes sample data as input and electric load predicted value as output, includes: a self-attention encoder 1, a history score calculating unit 2, a position score calculating unit 3, a position encoder 4, a Query sequence unit 5, a spatial attention unit 6, and a temporal attention unit 7.
Sample data is input into the self-attention encoder 1; the history score calculating unit 2 obtains a history score from the output of the self-attention encoder 1, and the position score calculating unit 3 obtains a position score from the output of the position encoder 4; the sample data, the history score, and the position score are input to a Query sequence unit 5, from which a Query sequence is generated.
The Query sequence is input to the spatial attention unit 6 to obtain a spatial attention sequence, the spatial attention sequence is input to the temporal attention unit 7 together with the output from the attention unit 1, and the temporal attention unit 7 outputs the power load predicted value.
The self-attention unit 1 and the history score calculating unit 2 are connected by adopting a full connection layer FC; the position encoder 4 and the position score calculating unit 3 are connected by adopting a full connection layer FC; the output value of the time attention unit 7 passes through the full connection layer FC to obtain the power load predicted value.
The Query sequence unit 5 generates all the Query sequences 8 required by the spatial attention unit 6 and the temporal attention unit 7 in a single step time, enabling parallel prediction by the system.
As shown in fig. 2, the load prediction method based on the deep self-attention network includes:
and step 1, collecting time-related original data and space-related original data of power load prediction, and constructing an input data set.
Preferably, in step 1, the time-related raw data includes load history data, regional illumination data, regional wind speed data, and the space-related raw data includes station position data.
And 2, processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm.
Preferably, in step 2, the input data set isWherein the ith data sample satisfies +.>Processing the input data set by adopting a regularization method comprises the following steps:
step 2.1, calculating the ith data sample according to the following relationL of (2) P Norms->
Wherein, the value range of p is [0, + ] infinity;
step 2.2, based on L P Norm numberFor the i-th data sample->Regularization is carried out, and the following relation is satisfied:
in the method, in the process of the invention,for the ith data sample in class D, where d=1, 2, …, D represents the total number of classes.
Step 3, constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the time attention unit based on the improved transducer network architecture; generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence.
Preferably, in step 3, in a single step time, the nth object O n Generating a Query sequence, comprising:
step 3.1, calculating a history score H according to the following relation (n) :
In the method, in the process of the invention,for the output of the self-attention encoder, W H As a learnable parameter of historical score, b H Network learning rate for historical scores;
step 3.2, calculating the position score P according to the following relation (n) :
In the method, in the process of the invention,w, which is the output of the position encoder p A learnable parameter, b, being a location score p Network learning rate for location score;
step 3.3, integrating the history score and the location score as a Query sequence Q (n) Is based on the sample data set as an input sequenceAt the time, query sequence Q (n) The following relation is satisfied:
wherein P is (n) For position score, H (n) Is a historical score.
Step 4, obtaining a joint attention sequence by using the spatial attention unit and the temporal attention unit for the input data set in the power load prediction period.
Preferably, step 4 comprises:
step 4.1, for the nth object O during the power load prediction period n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising power consumption which varies with time in various electric equipment and spatial related data set>Comprising the power consumption of the movable electric equipment along with the change of the position.
Step 4.2 for time dependent data setsCalculating the time attention weight +.>
In the method, in the process of the invention,
softmax (·) is the activation function, mapping the network output into (0, 1) intervals,
wherein,for feature vectors defined based on input features, defined as +.>And the output of the position encoder->The linear superposition is obtained, and the following relation is satisfied:
and->Computing intermediate variables inherent to deep learning attention mechanisms, allThe decoding result is obtained by linear transformation combination and is based on the same self-attention mechanism>And->The values are the same, and the following relational expression is satisfied:
W tQ characterization of first time network learning parameters for an attention networkAnd->Is used for the degree of similarity of (c) to (c),
W tK characterization for second time network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
d model is the dimension of the input variable.
Step 4.3 for spatially correlated data setsCalculating the spatial attention weight +.>
In the method, in the process of the invention,
softmax (·) is the activation function, mapping the network output into (0, 1) intervals,
for a feature vector defined based on input features, it is defined as a spatially dependent dataset +.>And>and->The same value is taken, and the following relation is satisfied:
W sQ characterization for first spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
W sK characterization for second spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),
d model is the dimension of the input variable.
Step 4.4, utilizing the time attention weightAnd spatial attention weight->The joint attention weight is calculated by the following relation>
Step 4.5, orderObtaining a power grid load prediction output value according to the following relation:
the resulting predicted output value is a fixed length joint attention sequence determined from the prediction period.
And 5, establishing a deep self-attention network model, and training the deep self-attention network based on sample data by taking Adam as an optimizer.
Preferably, step 5 comprises:
and 5.1, smoothing the L1 loss function, wherein the following relation is satisfied:
in the formula g (t,i) To train the corresponding true value of the sample, z (t,i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 s (. Cndot.) is to smooth the L1 penalty function.
Step 5.2, constructing the objective function L by the following relation o :
Step 5.3, using the objective function L o With Adam as an optimizer, the deep self-attention network is trained based on sample data.
And 6, inputting the test data into a network and outputting the power load predicted value.
Preferably, step 6 further includes calculating a root mean square error for the output power load prediction value, and using the root mean square error as an evaluation value of the power load prediction accuracy.
Example 1.
The pytorch programming is used to implement the operation of the deep self-attention network based load prediction system. The training process and the change in the loss function are shown in fig. 3 using the winter and summer week load data of a region in north america in 1992 as training and verification data.
In fig. 3, the values of the predicted root mean square error and the loss function gradually decrease with the training process and finally stabilize, and it is seen that the load prediction system based on the deep self-attention network can quickly converge after training. The root mean square error over the data set for the different network structures is detailed in table 1.
Table 1 root mean square error over data set for different prediction methods
As can be seen from table 1, the root mean square error value of the self-attention network proposed by the present invention is smaller than that of other network structures, so that the power load can be predicted more accurately.
The prediction results of the deep self-attention network based load prediction system are shown in fig. 4 and 5. In fig. 4 and 5, the prediction section result output by the load prediction system based on the deep self-attention network includes an actual value, and the power load can be effectively predicted.
Compared with the prior art, the method has the advantages that the time lag and accumulated error problems are overcome, meanwhile, the system constructs a mechanism of attention and mechanics among multiple variables, the time sequence prediction based on multiple variables is realized, and the prediction precision is effectively improved.
The beneficial effects of the invention also include:
1. the improved transducer network is used in a load prediction system based on a deep self-attention network to realize the prediction of a multidimensional time sequence by introducing a Query generation unit; only one network model is used, so that the structure of the prediction system is simplified, and the calculation speed is improved;
2. the Query generation unit is inserted between the self-attention unit and the time-attention unit and between the time-attention unit and the space-attention unit, all the required Query sequences can be generated in a single step time, so that the system can realize parallel prediction and has higher prediction efficiency;
3. meanwhile, the time correlation attention and the space correlation attention are calculated, so that load data used for load prediction not only have time characteristics but also have geographic characteristics, and a prediction result is more accurate and reliable;
4. the attention sequence with the fixed length is calculated according to the power load prediction demand, so that the traditional method for calculating the attention data at a certain moment is broken through, and the application range of the prediction result is wider.
While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.
Claims (4)
1. A load prediction system based on a deep self-attention network, the system taking sample data as input and power load predicted value as output, comprising an encoder and a decoder, characterized in that,
the encoder is a self-attention encoder, the decoder comprising: a spatial attention unit, a temporal attention unit;
the system further comprises: the system comprises a history score calculating unit, a position encoder and a Query sequence unit;
the sample data is input into a self-attention encoder, and in a single step time, the sample data is input into the self-attention encoder for the nth object O h Generating a Query sequence;
the history score calculating unit obtains a history score from the output of the self-attention encoder and calculates a history score H according to the following relation (n) :
In the method, in the process of the invention,for the output of the self-attention encoder, W H As a learnable parameter of historical score, b H Network learning rate for historical scores;
the position score calculating unit obtains the position score according to the output of the position encoder and calculates the position score P according to the following relation (n) :
In the method, in the process of the invention,w, which is the output of the position encoder p A learnable parameter, b, being a location score p Network learning rate for location score;
the sample data, the historical score and the position score are input into a Query sequence unit, and a Query sequence is generated by the Query sequence unit; wherein the history score and the position score are used as a Query sequence Q (n) Is based on the sample data set as an input sequenceAt the time, query sequence Q (n) The following relation is satisfied:
wherein P is (n) For position score, H (n) Is a historical score;
the Query sequence is input into a spatial attention unit to obtain a spatial attention sequence, the spatial attention sequence and the output of a self-attention unit are input into a time attention unit together, and the time attention unit outputs an electric load predicted value;
the system collects time-related original data and space-related original data of power load prediction, and an input data set is constructed, wherein the time-related original data comprises load historical data, regional illumination data and regional wind speed data, and the space-related original data comprises station position data;
processing the input data set by adopting a regularization method to obtain a sample data set scaled to a unit norm; the input data set isWherein the ith data sample satisfiesProcessing the input data set by adopting a regularization method comprises the following steps: calculating the i-th data sample +.>L of (2) P Norms->
Wherein, the value range of p is [0t+ ] infinity;
based on L P Norm numberFor the i-th data sample->Regularization is carried out, and the following relation is satisfied:
in the method, in the process of the invention,for the ith data sample in the D-th class, where d=1t2t … tD, D represents the total number of classes;
constructing a Query sequence unit among the self-attention encoder, the position encoder, the spatial attention unit and the temporal attention unit based on the improved transducer network architecture; for the nth object O in a single step time n Generating a Query sequence, comprising:
the history score H is calculated in the following relation (n) :
In the method, in the process of the invention,for the output of the self-attention encoder, W H As a learnable parameter of historical score, b H Network learning rate for historical scores;
calculating a position score P in the following relation (n) :
In the method, in the process of the invention,w, which is the output of the position encoder p A learnable parameter, b, being a location score p Network learning rate for location score;
integrating the historical score and the location score as a Query sequence Q (n) Is based on the sample data set as an input sequenceAt the time, query sequence Q (n) The following relation is satisfied:
wherein P is (n) For position score, H (n) Is a historical score;
generating all Query sequences required by a space attention unit and a time attention unit by using a Query sequence unit according to the sample data set, the history score and the position score; the historical score is the time influence of the time-related original data on the Query sequence, and the position score is the position influence of the space-related original data on the Query sequence; for the nth object O during the power load prediction period n Dividing an input data set into time-dependent data setsAnd spatially dependent data set->Wherein the time-dependent dataset +.>Comprising the power consumption which changes with time in various electric equipment, wherein the space related data set is +.>The power consumption of the movable electric equipment along with the change of the position is contained; for time-dependent data sets->Calculating the time attention weight +.>
Where softmax (·) is the activation function, mapping the network output into the (0 t 1) interval,
wherein,for feature vectors defined based on input features, defined as +.>And the output of the position encoder->The linear superposition is obtained, and the following relation is satisfied:
and->For the inherent calculation intermediate variable of deep learning attention mechanism, all decoding results are obtained by linear transformation combination, and the same self-attention mechanism is based on +.>And->The values are the same, and the following relational expression is satisfied:
W tQ characterization of first time network learning parameters for an attention networkAnd->Is used for the degree of similarity of (c) to (c),W tK for the second time network learning parameters of the attention network, characterize +.>And->Similarity of->d model Dimension for the input variable;
for spatially correlated data setsCalculating the spatial attention weight +.>
Where softmax (·) is the activation function, mapping the network output into the (0 t 1) interval,
for a feature vector defined based on input features, it is defined as a spatially dependent dataset +.>And (2) linear mapping ofAnd->The same value is taken, and the following relation is satisfied:
W sQ characterization for first spatial network learning parameters of an attention networkAnd->Is used for the degree of similarity of (c) to (c),W sK for the second spatial network learning parameter of the attention network, characterize +.>And->Similarity of->d model Dimension for the input variable;
using temporal attention weightsAnd spatial attention weight->The joint attention weight is calculated by the following relation>
Order theObtaining a power grid load prediction output value according to the following relation:
the obtained predicted output value is a fixed-length joint attention sequence determined according to the prediction period;
obtaining a joint attention sequence for the input data set using the spatial attention unit and the temporal attention unit during the power load prediction period; establishing a deep self-attention network model, and training the deep self-attention network based on sample data by taking Adam as an optimizer; inputting the test data into a network and outputting a power load predicted value; and smoothing the L1 loss function to satisfy the following relation:
in the formula g (t,i) For training samplesCorresponding true value, z (t,i) The predicted value is output for the depth network; the superscript t represents a time step, and i represents a predicted data value number at a corresponding time;for the true value corresponding to the training sample under the bounding box coordinate set, +.>A predicted value output by the depth network under the boundary frame coordinate set; l1 s (. Cndot.) is to smooth the L1 loss function;
constructing the objective function L in the following relation o :
Using an objective function L o With Adam as an optimizer, the deep self-attention network is trained based on sample data.
2. The deep self-attention network based load prediction system of claim 1 wherein,
the self-attention unit and the history score calculation unit are connected by adopting a full-connection layer;
the position encoder and the position score calculating unit are connected by adopting a full-connection layer;
and the output value of the time attention unit is subjected to a full connection layer to obtain a power load predicted value.
3. The deep self-attention network based load prediction system of claim 1 wherein,
the Query sequence unit generates all the Query sequences required by the spatial attention unit and the temporal attention unit in a single step time, and the system performs parallel prediction.
4. The deep self-attention network based load prediction method of claim 1, wherein,
the method further includes calculating a root mean square error for the output power load prediction value, and using the root mean square error as an evaluation value of the power load prediction accuracy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110807996.7A CN113379164B (en) | 2021-07-16 | 2021-07-16 | Load prediction method and system based on deep self-attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110807996.7A CN113379164B (en) | 2021-07-16 | 2021-07-16 | Load prediction method and system based on deep self-attention network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113379164A CN113379164A (en) | 2021-09-10 |
CN113379164B true CN113379164B (en) | 2024-03-26 |
Family
ID=77582233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110807996.7A Active CN113379164B (en) | 2021-07-16 | 2021-07-16 | Load prediction method and system based on deep self-attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113379164B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988449B (en) * | 2021-11-05 | 2024-04-12 | 国家电网有限公司西北分部 | Wind power prediction method based on transducer model |
CN115081586B (en) * | 2022-05-19 | 2023-03-31 | 中国科学院计算机网络信息中心 | Photovoltaic power generation time sequence prediction method and system based on time and space attention |
CN116831581A (en) * | 2023-06-15 | 2023-10-03 | 中南大学 | Remote physiological sign extraction-based driver state monitoring method and system |
CN117175588B (en) * | 2023-11-03 | 2024-01-16 | 邯郸欣和电力建设有限公司 | Space-time correlation-based electricity load prediction method and device |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009089594A (en) * | 2007-09-28 | 2009-04-23 | Kankoku Denryoku Kosha | Temporal-spatial load analysis system of power facility utilizing inspection data and calculation method of load |
CN104598986A (en) * | 2014-12-12 | 2015-05-06 | 国家电网公司 | Big data based power load prediction method |
CN110619430A (en) * | 2019-09-03 | 2019-12-27 | 大连理工大学 | Space-time attention mechanism method for traffic prediction |
CN110633867A (en) * | 2019-09-23 | 2019-12-31 | 国家电网有限公司 | Ultra-short-term load prediction model based on GRU and attention mechanism |
CN110889545A (en) * | 2019-11-20 | 2020-03-17 | 国网重庆市电力公司电力科学研究院 | Power load prediction method and device and readable storage medium |
CN111080032A (en) * | 2019-12-30 | 2020-04-28 | 成都数之联科技有限公司 | Load prediction method based on Transformer structure |
CN111507521A (en) * | 2020-04-15 | 2020-08-07 | 北京智芯微电子科技有限公司 | Method and device for predicting power load of transformer area |
CN111651504A (en) * | 2020-06-03 | 2020-09-11 | 湖南大学 | Multi-element time sequence multilayer space-time dependence modeling method based on deep learning |
CN111931989A (en) * | 2020-07-10 | 2020-11-13 | 国网浙江省电力有限公司绍兴供电公司 | Power system short-term load prediction method based on deep learning neural network |
CN112052977A (en) * | 2019-12-24 | 2020-12-08 | 中国石油大学(华东) | Oil reservoir reserve prediction method based on deep space-time attention network |
CN112163689A (en) * | 2020-08-18 | 2021-01-01 | 国网浙江省电力有限公司绍兴供电公司 | Short-term load quantile probability prediction method based on depth Attention-LSTM |
CN112330215A (en) * | 2020-11-26 | 2021-02-05 | 长沙理工大学 | Urban vehicle demand prediction method, equipment and storage medium |
CN112653142A (en) * | 2020-12-18 | 2021-04-13 | 武汉大学 | Wind power prediction method and system for optimizing depth transform network |
CN112949930A (en) * | 2021-03-17 | 2021-06-11 | 中国科学院合肥物质科学研究院 | PA-LSTM network-based road motor vehicle exhaust high-emission early warning method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4156032A1 (en) * | 2017-05-23 | 2023-03-29 | Google LLC | Attention-based sequence transduction neural networks |
US10700523B2 (en) * | 2017-08-28 | 2020-06-30 | General Electric Company | System and method for distribution load forecasting in a power grid |
US10940863B2 (en) * | 2018-11-01 | 2021-03-09 | GM Global Technology Operations LLC | Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle |
-
2021
- 2021-07-16 CN CN202110807996.7A patent/CN113379164B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009089594A (en) * | 2007-09-28 | 2009-04-23 | Kankoku Denryoku Kosha | Temporal-spatial load analysis system of power facility utilizing inspection data and calculation method of load |
CN104598986A (en) * | 2014-12-12 | 2015-05-06 | 国家电网公司 | Big data based power load prediction method |
CN110619430A (en) * | 2019-09-03 | 2019-12-27 | 大连理工大学 | Space-time attention mechanism method for traffic prediction |
CN110633867A (en) * | 2019-09-23 | 2019-12-31 | 国家电网有限公司 | Ultra-short-term load prediction model based on GRU and attention mechanism |
CN110889545A (en) * | 2019-11-20 | 2020-03-17 | 国网重庆市电力公司电力科学研究院 | Power load prediction method and device and readable storage medium |
CN112052977A (en) * | 2019-12-24 | 2020-12-08 | 中国石油大学(华东) | Oil reservoir reserve prediction method based on deep space-time attention network |
CN111080032A (en) * | 2019-12-30 | 2020-04-28 | 成都数之联科技有限公司 | Load prediction method based on Transformer structure |
CN111507521A (en) * | 2020-04-15 | 2020-08-07 | 北京智芯微电子科技有限公司 | Method and device for predicting power load of transformer area |
CN111651504A (en) * | 2020-06-03 | 2020-09-11 | 湖南大学 | Multi-element time sequence multilayer space-time dependence modeling method based on deep learning |
CN111931989A (en) * | 2020-07-10 | 2020-11-13 | 国网浙江省电力有限公司绍兴供电公司 | Power system short-term load prediction method based on deep learning neural network |
CN112163689A (en) * | 2020-08-18 | 2021-01-01 | 国网浙江省电力有限公司绍兴供电公司 | Short-term load quantile probability prediction method based on depth Attention-LSTM |
CN112330215A (en) * | 2020-11-26 | 2021-02-05 | 长沙理工大学 | Urban vehicle demand prediction method, equipment and storage medium |
CN112653142A (en) * | 2020-12-18 | 2021-04-13 | 武汉大学 | Wind power prediction method and system for optimizing depth transform network |
CN112949930A (en) * | 2021-03-17 | 2021-06-11 | 中国科学院合肥物质科学研究院 | PA-LSTM network-based road motor vehicle exhaust high-emission early warning method |
Non-Patent Citations (1)
Title |
---|
基于深度学习算法的主动配电网负荷预测研究;马锋 等;《 计算机工程与应用 》;71-75, 114 * |
Also Published As
Publication number | Publication date |
---|---|
CN113379164A (en) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113379164B (en) | Load prediction method and system based on deep self-attention network | |
De Giorgi et al. | Assessment of the benefits of numerical weather predictions in wind power forecasting based on statistical methods | |
CN104951836A (en) | Posting predication system based on nerual network technique | |
CN115622047B (en) | Power Transformer load prediction method based on Transformer model | |
CN110443417A (en) | Multiple-model integration load forecasting method based on wavelet transformation | |
CN106548270A (en) | A kind of photovoltaic plant power anomalous data identification method and device | |
Li et al. | Deep spatio-temporal wind power forecasting | |
CN105303268A (en) | Wind power generation output power prediction method based on similarity theory | |
CN109087215A (en) | More Power Output for Wind Power Field joint probability density prediction techniques | |
CN103279030B (en) | Dynamic soft measuring modeling method and device based on Bayesian frame | |
CN115169742A (en) | Short-term wind power generation power prediction method | |
Siddarameshwara et al. | Electricity short term load forecasting using elman recurrent neural network | |
Johannesen et al. | Comparing recurrent neural networks using principal component analysis for electrical load predictions | |
CN108830405B (en) | Real-time power load prediction system and method based on multi-index dynamic matching | |
CN106682312A (en) | Industrial process soft-measurement modeling method of local weighing extreme learning machine model | |
CN110222910A (en) | A kind of active power distribution network Tendency Prediction method and forecasting system | |
CN109270917B (en) | Intelligent power plant steam turbine bearing-oriented closed-loop control system fault degradation state prediction method | |
CN116703644A (en) | Attention-RNN-based short-term power load prediction method | |
Chen et al. | Air quality prediction based on Kohonen Clustering and ReliefF feature selection | |
CN106709570A (en) | Time dimension expansion and local weighting extreme learning machine-based soft measurement modeling method | |
CN106773697A (en) | A kind of time dimension expands the industrial process soft-measuring modeling method of extreme learning machine model | |
CN113705887A (en) | Data-driven photovoltaic power generation power prediction method and system | |
Li et al. | Mutual information variational autoencoders and its application to feature extraction of multivariate time series | |
Alomoush et al. | Residential Power Load Prediction in Smart Cities using Machine Learning Approaches | |
Guan et al. | Multiple wind power time series modeling method considering correlation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |