CN114529051A - Long-term power load prediction method based on hierarchical residual self-attention neural network - Google Patents
Long-term power load prediction method based on hierarchical residual self-attention neural network Download PDFInfo
- Publication number
- CN114529051A CN114529051A CN202210048738.XA CN202210048738A CN114529051A CN 114529051 A CN114529051 A CN 114529051A CN 202210048738 A CN202210048738 A CN 202210048738A CN 114529051 A CN114529051 A CN 114529051A
- Authority
- CN
- China
- Prior art keywords
- sequence
- data
- neural network
- load
- term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 35
- 230000007774 longterm Effects 0.000 title claims abstract description 13
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 16
- 239000013598 vector Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 9
- 230000006835 compression Effects 0.000 claims description 7
- 238000007906 compression Methods 0.000 claims description 7
- 238000005065 mining Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 239000000654 additive Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 3
- 230000000737 periodic effect Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims 1
- 238000005457 optimization Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 208000025174 PANDAS Diseases 0.000 description 2
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 description 2
- 240000004718 Panda Species 0.000 description 2
- 235000016496 Panda oleosa Nutrition 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000009849 deactivation Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/003—Load forecast, e.g. methods or systems for forecasting future load demand
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Mathematical Physics (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Computational Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Optimization (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Power Engineering (AREA)
- Biophysics (AREA)
- Development Economics (AREA)
- Biomedical Technology (AREA)
- Game Theory and Decision Science (AREA)
- Algebra (AREA)
- Evolutionary Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Quality & Reliability (AREA)
Abstract
The invention discloses a long-term power load prediction method based on a hierarchical residual error self-attention neural network. The invention comprises three parts: firstly, mixed characteristic data of a trend item, a period item, a holiday item and a weather item in historical load data are extracted in a self-adaptive mode and fused with the historical load sequence data. Secondly, carrying out time component recursive decomposition on the fused sequence data, encoding the time component by utilizing a hierarchical residual self-attention network block, and thirdly, reconstructing the time component, carrying out generative decoding and predicting the power load fluctuation in a period of time in the future. According to the invention, the load sequence is disassembled, reconstructed and predicted in a layering manner, the long-term and short-term characteristics of the sequence are effectively captured, and the prediction precision of the model in a long-sequence load prediction scene is improved.
Description
Technical Field
The invention relates to the technical field of load prediction of a power energy system, in particular to a long-term power load prediction method based on a hierarchical residual error self-attention neural network
Background
The power load prediction technology is an indispensable part of services in the composition of an intelligent power grid system, is actively applied to a plurality of scenes, and how to effectively control the power load to achieve balance of supply and demand becomes an important research direction in the operation and management of a modern power system. The core problem of load prediction is how to obtain the historical change rule of the prediction object and the relation between the historical change rule and some influence factors, the prediction model is actually a mathematical function expressing the change rule, and the challenge of load prediction is that the load prediction is influenced by a plurality of external factors, including power trading market factors, national policy factors, weather factors, residential electricity habit factors and the like, which are problems to be solved
Models for load prediction can be essentially categorized as mathematical models for time series prediction, and common methods can be divided into: traditional statistical methods, machine learning based methods, deep learning based methods, and third party tool prediction methods. (1) Based on the traditional statistical methods, the common time series models including Auto Regression model (AR), Auto Regression Moving Average model (ARMA), etc. have simple principles, are suitable for analyzing the stable sequences and simple non-stable sequences under a small number of orders, but are not suitable for solving the nonlinear prediction scenes, (2) based on the machine learning method, the machine learning is a very large gate class, wherein a plurality of models suitable for solving the nonlinear prediction are available, the common model includes Support Vector Machines (Support vectors, SVM), decision tree models, K neighbor models, etc., and even integrated learning models (XGBoost, LightGBM), etc. with better prediction capability, the machine learning model well solves the nonlinear problem, but is subject to the characteristic mining capability under a large number of levels and high-dimensional data prediction scenes, often, the data features are manually processed to build a machine learning prediction model. (3) Based on the deep learning method, the deep learning model can adaptively mine and learn data characteristics due to strong fitting ability, and is very suitable for solving the problem of nonlinear prediction, common methods include Convolutional Neural Networks (CNN), Long Short Term Memory Networks (LSTM), gated-round units (GRU), and the like, wherein the Recurrent Neural Networks represented by LSTM and GRU are widely used in sequence modeling and have good sequence ability, but the Recurrent Neural Networks gradually lose learning ability to Long-distance historical characteristics in the training process due to serial learning and have error accumulation phenomenon, so the Recurrent Neural Networks are often used together with other deep learning models, (4) third-party tool prediction method, in recent years, some large-scale domestic and foreign companies also open time sequence prediction methods of self-research thereof, for example, FaceBook has launched a Prophet model in 2017, the model comprehensively considers trend terms, period terms and holiday terms of time series, the model is simple to use and has stable prediction capability, then Amazon has launched a deepAR model in 2018, the model uses an autoregressive reasoning mode based on probability, uncertainty in the prediction process is reduced, the prediction accuracy of the tools is remarkable, but only short-term prediction can be achieved, and the method is not suitable for energy load scenes with high real-time performance and strong stability
Disclosure of Invention
The invention aims to combine the prior art and improve the prior art to optimize the modeling effect of a load prediction model in a power load prediction scene, specifically, the invention uses a neural network mode to model, and provides a network structure based on a hierarchical residual error self-attention mechanism, which is used for long-sequence prediction of stable and highly periodic power load data
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
a long-term power load prediction method based on a hierarchical residual self-attention neural network comprises the following steps:
And 2, cleaning the source data, extracting features from the cleaned historical load data and weather data, respectively extracting four features of a trend item, a period item, a holiday item and a weather item of load fluctuation, fusing the historical load sequence data and the feature data to obtain a fusion vector, and inputting the fusion vector for the next neural network modeling
And 4, carrying out generative coding on the characteristics extracted from the source historical load data to be predicted, and predicting the load sequence in the next time step range.
The invention has the beneficial effects that: the model provided by the invention is based on a Transformer neural network, a self-attention mechanism is used in the network structure, and compared with the traditional recurrent neural network, the model has the capability of capturing global features. Compared with the traditional method, the method is more excellent and flexible in feature mining capability and model generalization capability, the prediction error of the medium-term and long-term load prediction can be well reduced by realizing the load prediction through the method, feedback guidance is provided for operation and allocation of the power unit, and the stable operation of the power system is ensured.
Drawings
Fig. 1 is a schematic flowchart of a long-term power load prediction method based on a hierarchical residual error self-attention neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an overall framework of a hierarchical residual error-based self-attention neural network prediction model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a framework of a transform neural network model;
FIG. 4 is a block diagram of a residual neural network model;
FIG. 5 is a block diagram of an embodiment of the present invention in which each layer of the modified residual self-attention block is framed;
FIG. 6 is a block diagram of prediction using generative decoding according to an embodiment of the present invention;
Detailed Description
The invention is further explained with reference to the attached drawings, the flow chart of the implementation of the invention is shown in figure 1,
and 2-1, extracting weather data characteristics.
The method comprises the steps of coding weather data collected by a sensor, analyzing the data of the collected data at least including temperature data, weather state data, timestamp data and the like, and eliminating the abnormal condition with excessive deviationData, to temperature data Xweather(T) performing a maximum-minimum normalization, wherein the normalization function is expressed as:
similarly, for other numerical weather related data, the method can be used for carrying out feature normalization to effectively help subsequent feature fusion, and for weather state data Xweather(S) is often a category type tag, shaped as [ sunny, cloudy, light rain, heavy rain, light snow, …]For such data, a one-hot encoding method is adopted to convert the data into data of a numerical type, specifically, each tag is encoded into a unique numerical value, and the one-hot encoding is expressed as follows:
status of state | In sunny days | Cloudy | Light rain | Heavy Rain | Small snow | …… |
Encoding a value | 0 | 1 | 2 | 3 | 4 | …… |
The characteristic processing of the weather data can be realized through the method.
Step 2-2, extracting trend item characteristics and periodic item characteristics of historical load sequence
The historical load sequence characteristics are main factors influencing the trend of a future sequence, the time sequence is subjected to characteristic decomposition through a shallow neural network aiming at the nonlinear and time-varying characteristics of the time sequence, data cleaning is required to be carried out before decomposition, the data is analyzed, the value with overlarge offset is eliminated, specifically, a mixed sequence decomposition layer neural network is defined, and the original input is XinputThe trend term feature and the period term feature may be generated according to the following process:
Xtrend=MovingAvg(Xinput)
Xperiod=Xinput-Xtrend
wherein MovingAvg is a moving average function, obtained by using an average pooling operation of one-dimensional convolution, through which a trend term of the whole sequence fluctuation can be obtained, and then a period term can be obtained by subtracting the trend term from the original sequence
Step 2-3, extracting the holiday term characteristics of the historical load sequence
In load prediction, the presence of important holidays also affects the trend of the load to some extent, specifically, the time stamp X for the original load data extracted in step 1timestampAnalyzing the data by using the pandas and numpy libraries of python language, and calculating the expansion characteristics of the date of each time stamp, including the month X of the datemonthDay number XdayHour XhourMinute XminuteDay of week XweekdayWhether it is a workday XisworkWhether it is a holiday XisholidayWhether it is a double breakDay XisweekendAnd (4) analyzing the time by using a DataFrame library of the pandas, and expressing the characteristics of finer granularity as follows
Xmonth,Xday,Xhour,Xminute,...=Extend(Xtimestamp)
Xtimestamp=Linear(Extend(Xtimestamp))
Wherein Extend is a feature extension function, and converts the extended multidimensional features into a data form with the same dimension as a source sequence through a nonlinear conversion layer for subsequent feature fusion
Step 2-4. feature embedding fusion
Through the first three steps, step 2-1, 2-2, 2-3, the existing feature data set, X, can be obtainedweather,Xtrend,Xperiod,Xtrend,XtimestampThese features are then fused, here using an additive model, for subsequent hierarchical residual neural network inputs, which are expressed as:
dropout is a common neuron inactivation rate function in neural network modeling, and aims to prevent the occurrence of overfitting, RELU is a common activation function, and finally, fused features can be obtained through an additive model
and 3-1, decomposing the sequence characteristics.
The invention has the major innovation that a layered decomposition sequence modeling process is used for replacing the traditional linear modeling process, the characteristic sequence is subjected to recursive decomposition continuously according to the number of layers, then a residual self-attention network is used for modeling decomposition characteristics of each layer, and finally better characteristic expression can be trained in a deeper layer, specifically, a decomposition algorithm provided by the invention comprises parity decomposition and dichotomy decomposition, wherein pseudo code expression of the algorithm is as follows:
wherein the content of the first and second substances,the method is characterized in that the method is a mixed characteristic sequence of source input, Level is the number of preset layers, Splitseries is a sequence decomposition function, a default algorithm provided by the invention adopts dichotomy decomposition, and two decomposed characteristic components X are obtainedleft,XrightRespectively input into the residual block to be updated to obtainThen, continuing to adopt the algorithm 1 to carry out recursive decomposition until reaching the limit of the number of layers, and finally returning the combined sequence by using the Merge function
Step 3-2, extracting information of characteristic components by using hierarchical residual error self-attention neural network
The prototype of the hierarchical residual self-attention neural network provided by the invention is a Transformer network, the architecture diagram of which is shown in fig. 3, specifically, the invention uses a self-attention mechanism, compared with LSTM and GRU, the network has the potential of mining the dependency between time sequences, and the self-attention mechanism emphasizes the global state and better prevents the information loss. The method comprises the steps of transforming an original Transformer in consideration of training time and prediction accuracy, specifically, replacing a feedforward neural network in an original Transformer encoder with a convolutional network with a smaller parameter, adding more cross-layer residual errors to stabilize gradient change during model training aiming at a hierarchical structure proposed by the design, simplifying a Transformer decoder layer, replacing the basic structure with a combination of a full connection layer and a Gaussian error function, and enabling an integrally modified framework to be as shown in FIG. 5
At each layer, the characteristic component X is divided intoinputInputting the time sequence characteristic information into a model to obtain time sequence characteristic information X with time sequence dependencydepExpressed as:
Xdep=ResidualAttentionBlock(Xinput)
the step 3-2 specifically comprises the following steps:
step 3-2-1, dividing each layer into single time characteristic component XinputInputting the residual error into a multi-head residual error self-attention block to obtain a coded characteristic XemdedThe multi-headed residual self-attention mechanism is expressed as:
ResidualMultiHead(H)=Concat(head1,head2,...headn)Wo
wherein, ResidualMultiHead represents multi-head residual error self-attention layer, H represents the number of attention heads, WoRepresenting weight vectors, i.e. non-linearly transforming the fused feature vectors of the plurality of headers to map to a specified length, head1,head2,...headnRepresenting the output from the attention layer for each head, Concat is a tensor splicing function, and the computation for each head is expressed as follows:
whereinQi,Ki,ViIs obtained by non-linear conversion after encoding the input data in each head, PreviThe probability matrix calculated by the multi-head self-attention layer of the previous layer is transmitted to the next layer, stable and excellent performance can still be obtained under a deep network structure, and the final fusion characteristic X is obtained by using a plurality of headsattnThese variables are represented as follows:
step 3-2-2: inputting the output characteristics of the multi-head self-attention Layer into a first Layer regularization Layer1 to generate a characteristic vector Xnorm1And generates a copy X thereofnorm2Is mixing Xnorm1Inputting the code vector X into a second layer one-dimensional convolution network to obtain a code vector XconvIs mixing XconvAnd Xnorm2Connecting, generating a coded time characteristic component Z which is transmitted to the next Layer of the self-attention Layer through a second Layer regularization Layer2, and simultaneously calculating a probability matrix Prev in the step 3-2-1iAlso passed to the next layer, the relative expression is as follows:
Xnorm1=NormalizationLayer1(Xattn)
Xnorm2=Xnorm1
Xconv=Dropout(Relu(Conv1d(Xnorm1)))
Z=NormalizationLayer2(Xconv+Xnorm2)
step 3-2-3. repeat steps 3-2-1 and 3-2-2, using the same operation in the residual attention unit of each layer stack in the encoder section in the hierarchical residual block.
And 3-2-4, inputting the vector Z finally coded by the coder into a decoder for decoding, wherein the decoder is improved from a traditional transform structure and is properly simplified, and the expression is as follows:
Z=Gelu(Linear(Dropout(Z)))
wherein, Dropout is a hyper-parameter, represents the neuron deactivation rate in the neural network, and plays a role of preventing overfitting, Linear is a simple nonlinear transfer function, GELU is a Gaussian error Linear unit which has better performance in sequence modeling, and the comprehensive performance is the most excellent under a plurality of scenes, and the expression mode is as follows:
the time component Z is transmitted to the next layer residual error self-attention block by the time component characteristic after decoding by the decoder and having very good context expression capability.
Step 3-2-5. the steps 3-2-1, 3-2-2, 3-2-3, 3-2-4 are cycled until the sequence can not be divided (the requirement of layer number is achieved)
3-2-6, time series reconstruction, namely through the steps from 3-2-1 to 3-2-5, the original time component characteristics are already segmented into a plurality of time component characteristics with the same length, and reduction is carried out according to the relative position sequence of the original characteristics, wherein the following steps are respectively a segmentation and reconstruction algorithm flow adopting an odd-even segmentation strategy and a binary segmentation strategy:
compressing the reconstructed sequence in the above way, and taking the compressed sequence value and the real sequence value as the mean square errorA loss function is used to update parameters of the neural network, thereby training the network, setting the compression length as embed _ len, and using X as the compressed vectorembedTo express
Xembed=Embed(XT,embed_len)
Finally, updating model parameters by taking mean-square error (MSE) as a loss function
WhereinIs a predicted value, X for the training phaseembedIs represented by YTTrue value, X for training phasetrueTo represent
Xembed=Embed(XT,embed_len)
after obtaining the compressed sequence, the invention carries out long sequence prediction by proposing a generative decoding mode, initializes the full zero tensor X with the same dimension as the prediction length by setting the prediction length prediction _ lenzeroIs mixing XembedAnd XzeroPerforming horizontal splicing, performing compression again, wherein the length of the compression is predicted _ len, and generating load prediction X of the historical sequencepred:
Xpred=Embed(Concat(Xembed,Xzero),predict_len)
The above is the preferred implementation process of the present invention, and all the changes made according to the present invention technique, which produce the functional effects that do not exceed the scope of the present invention technical solution, belong to the protection scope of the present invention.
Claims (5)
1. The long-term power load prediction method based on the hierarchical residual error self-attention neural network is characterized by comprising the following steps of:
step 1, acquiring source data of a unit load sequence and weather data monitored by a sensor from a time sequence database;
step 2, performing data cleaning on the source data, performing feature extraction from the cleaned historical load data and weather data, and respectively extracting four major features of a trend item, a period item, a holiday item and a weather item of load fluctuation;
performing data fusion on the historical load sequence data and the weather characteristic data to obtain a fusion vector for the input of the next neural network modeling;
step 3, encoding the input sequence by using a hierarchical residual self-attention neural network, extracting and mining important features in the input sequence, and performing model training;
step 4, carrying out generative coding on the characteristics extracted from the source historical load data to be predicted, and predicting a load sequence in the next time step range;
extracting integral trend term and periodic term characteristics from the original load sequence by using a convolution neural network; performing feature extraction on the holiday term and the weather term by using a one-hot coding mode, performing horizontal splicing on the source load sequence and all extracted feature data by using an additive idea, and performing conversion through a full connection layer to obtain a fused time sequence feature vector;
step 3, adopting a recursion idea, hierarchically performing feature downsampling decomposition on the time sequence feature vector, performing feature mining on the time sequence component after decomposition of each layer by using a residual self-attention network, recombining the mined features according to the original relative positions on the basis of reaching the decomposition depth, converting the mined features into a prediction result through a one-dimensional convolution layer, continuously iterating according to the mode, using an Adam algorithm as an optimization algorithm, and using a mean square error between a predicted value and a true value as a loss function to perform model training;
and 4, specifically, performing feature conversion on source load data to be predicted through the steps 2 and 3, splicing the converted features with an all-zero vector initialized to the prediction length, performing generative coding on the spliced vector through the model trained in the step 3, and predicting the load sequence fluctuation of the whole section in the future.
2. The long-term power load prediction method based on hierarchical residual self-attention neural network according to claim 1, characterized in that: the period term is obtained by subtracting the trend term from the original load sequence.
3. The long-term power load prediction method based on hierarchical residual self-attention neural network of claim 1, characterized in that: the prototype of the hierarchical residual self-attention neural network is a Transformer network, a feedforward neural network of an encoder in the original Transformer network is replaced by a convolutional network, more cross-layer residual connections are added to stabilize gradient changes during model training, a decoder layer in the Transformer network is simplified, and the combination of a full connection layer and a Gaussian error function is replaced.
4. The long-term power load prediction method based on hierarchical residual self-attention neural network of claim 1, characterized in that: and 4, splicing in the step 4 adopts horizontal splicing and compression.
5. The long-term power load prediction method based on hierarchical residual self-attention neural network of claim 4, wherein: the compressed length is the predicted length.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210048738.XA CN114529051A (en) | 2022-01-17 | 2022-01-17 | Long-term power load prediction method based on hierarchical residual self-attention neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210048738.XA CN114529051A (en) | 2022-01-17 | 2022-01-17 | Long-term power load prediction method based on hierarchical residual self-attention neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114529051A true CN114529051A (en) | 2022-05-24 |
Family
ID=81620165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210048738.XA Pending CN114529051A (en) | 2022-01-17 | 2022-01-17 | Long-term power load prediction method based on hierarchical residual self-attention neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114529051A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114707772A (en) * | 2022-06-06 | 2022-07-05 | 山东大学 | Power load prediction method and system based on multi-feature decomposition and fusion |
CN115204529A (en) * | 2022-09-15 | 2022-10-18 | 之江实验室 | Non-invasive load monitoring method and device based on time attention mechanism |
CN115440390A (en) * | 2022-11-09 | 2022-12-06 | 山东大学 | Method, system, equipment and storage medium for predicting number of cases of infectious diseases |
CN116029201A (en) * | 2022-12-23 | 2023-04-28 | 浙江苍南仪表集团股份有限公司 | Gas flow prediction method and system based on clustering and cyclic neural network |
CN116776228A (en) * | 2023-08-17 | 2023-09-19 | 合肥工业大学 | Power grid time sequence data decoupling self-supervision pre-training method and system |
CN117114056A (en) * | 2023-10-25 | 2023-11-24 | 城云科技(中国)有限公司 | Power load prediction model, construction method and device thereof and application |
-
2022
- 2022-01-17 CN CN202210048738.XA patent/CN114529051A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114707772A (en) * | 2022-06-06 | 2022-07-05 | 山东大学 | Power load prediction method and system based on multi-feature decomposition and fusion |
CN114707772B (en) * | 2022-06-06 | 2022-08-23 | 山东大学 | Power load prediction method and system based on multi-feature decomposition and fusion |
CN115204529A (en) * | 2022-09-15 | 2022-10-18 | 之江实验室 | Non-invasive load monitoring method and device based on time attention mechanism |
CN115204529B (en) * | 2022-09-15 | 2022-12-20 | 之江实验室 | Non-invasive load monitoring method and device based on time attention mechanism |
CN115440390A (en) * | 2022-11-09 | 2022-12-06 | 山东大学 | Method, system, equipment and storage medium for predicting number of cases of infectious diseases |
CN115440390B (en) * | 2022-11-09 | 2023-03-24 | 山东大学 | Infectious disease case quantity prediction method, system, equipment and storage medium |
CN116029201A (en) * | 2022-12-23 | 2023-04-28 | 浙江苍南仪表集团股份有限公司 | Gas flow prediction method and system based on clustering and cyclic neural network |
CN116029201B (en) * | 2022-12-23 | 2023-10-27 | 浙江苍南仪表集团股份有限公司 | Gas flow prediction method and system based on clustering and cyclic neural network |
CN116776228A (en) * | 2023-08-17 | 2023-09-19 | 合肥工业大学 | Power grid time sequence data decoupling self-supervision pre-training method and system |
CN116776228B (en) * | 2023-08-17 | 2023-10-20 | 合肥工业大学 | Power grid time sequence data decoupling self-supervision pre-training method and system |
CN117114056A (en) * | 2023-10-25 | 2023-11-24 | 城云科技(中国)有限公司 | Power load prediction model, construction method and device thereof and application |
CN117114056B (en) * | 2023-10-25 | 2024-01-09 | 城云科技(中国)有限公司 | Power load prediction model, construction method and device thereof and application |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114529051A (en) | Long-term power load prediction method based on hierarchical residual self-attention neural network | |
CN113592185B (en) | Power load prediction method based on Transformer | |
CN112364975B (en) | Terminal running state prediction method and system based on graph neural network | |
CN111079989B (en) | DWT-PCA-LSTM-based water supply amount prediction device for water supply company | |
CN115169703A (en) | Short-term power load prediction method based on long-term and short-term memory network combination | |
CN114493014A (en) | Multivariate time series prediction method, multivariate time series prediction system, computer product and storage medium | |
CN115587454A (en) | Traffic flow long-term prediction method and system based on improved Transformer model | |
CN113128113A (en) | Poor information building load prediction method based on deep learning and transfer learning | |
CN114817773A (en) | Time sequence prediction system and method based on multi-stage decomposition and fusion | |
CN114519471A (en) | Electric load prediction method based on time sequence data periodicity | |
CN116702831A (en) | Hybrid short-term wind power prediction method considering massive loss of data | |
CN113360848A (en) | Time sequence data prediction method and device | |
CN117494906B (en) | Natural gas daily load prediction method based on multivariate time series | |
Liao et al. | Scenario prediction for power loads using a pixel convolutional neural network and an optimization strategy | |
CN115713044B (en) | Method and device for analyzing residual life of electromechanical equipment under multi-condition switching | |
WO2024012735A1 (en) | Training of a machine learning model for predictive maintenance tasks | |
CN116911442A (en) | Wind power generation amount prediction method based on improved transducer model | |
Rodriguez et al. | Multi-step forecasting strategies for wind speed time series | |
CN116127325A (en) | Method and system for detecting abnormal flow of graph neural network business based on multi-attribute graph | |
Wang et al. | Grid load forecasting based on dual attention BiGRU and DILATE loss function | |
CN116128082A (en) | Highway traffic flow prediction method and electronic equipment | |
Rathnayaka et al. | Specialist vs generalist: A transformer architecture for global forecasting energy time series | |
CN113780377A (en) | Rainfall level prediction method and system based on Internet of things data online learning | |
Chen et al. | Multi-Objective Spiking Neural Network for Optimal Wind Power Prediction Interval | |
Han et al. | Online aware synapse weighted autoencoder for recovering random missing data in wastewater treatment process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |