CN115622047A - Power Transformer load prediction method based on Transformer model - Google Patents
Power Transformer load prediction method based on Transformer model Download PDFInfo
- Publication number
- CN115622047A CN115622047A CN202211379043.6A CN202211379043A CN115622047A CN 115622047 A CN115622047 A CN 115622047A CN 202211379043 A CN202211379043 A CN 202211379043A CN 115622047 A CN115622047 A CN 115622047A
- Authority
- CN
- China
- Prior art keywords
- layer
- load
- model
- transformer
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000003993 interaction Effects 0.000 claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000012795 verification Methods 0.000 claims abstract description 16
- 230000008859 change Effects 0.000 claims abstract description 14
- 238000012360 testing method Methods 0.000 claims abstract description 13
- 230000002452 interceptive effect Effects 0.000 claims abstract description 12
- 238000005070 sampling Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 19
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 6
- 238000004821 distillation Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 101100533306 Mus musculus Setx gene Proteins 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000010276 construction Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/003—Load forecast, e.g. methods or systems for forecasting future load demand
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Power Engineering (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Supply And Distribution Of Alternating Current (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a power Transformer load prediction method based on a Transformer model, which comprises the following steps: acquiring load data of a power transformer, and arranging the acquired load data of the power transformer according to time to obtain a sequence sample data set; dividing a data set into a training set, a testing set and a verification set, and ensuring that each data set sampling cycle can represent a characteristic change sample in the same time period; defining and establishing an interactive multi-head attention Transformer-based model, and initializing network internal parameters and a learning rate; a three-layer decoder is constructed using a multi-head attention layer and a multi-head attention interaction layer. The power transformer load prediction method provided by the invention can better capture the dependency relationship between the long sequence data, thereby realizing accurate prediction of the power transformer load and having certain practicability in the construction of an intelligent power grid.
Description
Technical Field
The invention belongs to the technical field of power metering data processing, and particularly relates to a method for predicting the load of a power transformer.
Background
The smart grid realizes the reliable, safe, economic, efficient and environment-friendly operation of the power grid through advanced sensing and measuring technologies and advanced control systems. The power transformer is an important device in power grid construction, and accurate long-term prediction of loads of the power transformer is an important condition for constructing an intelligent power grid according to historical operation rule data information of the power transformer. The power transformer load prediction is characterized in that historical time sequence data are used as a data source, a power transformer load prediction mathematical model is established by using technologies such as data mining and deep learning, and the power transformer load is predicted according to the established model, so that reasonable power distribution is realized, and power waste is reduced.
With the continuous increase of installed capacity of wind power, technical and economic influences brought by wind power integration on a main power grid are larger and larger, and greater challenges are provided for transformer data processing. Because grid-connected operation of a wind power plant has negative influences on the power quality, the voltage stability, the power grid safety and other aspects of the power grid, the power quality and the voltage stability can be effectively improved only by accurately predicting the load of the power transformer. Therefore, how to reasonably estimate the load of the power transformer can effectively reduce unnecessary power waste and fully play the role of auxiliary decision making of the smart grid.
The power transformer has the characteristics of complex structure and nonlinear change of material parameters. During power distribution, the transformer can often only be adjusted relatively conservatively. In reality, it is difficult to predict the load of the power transformer, because it is influenced by various factors such as weather, temperature, season, environment, etc., and thus exhibits complicated variation characteristics. The currently proposed load prediction methods for power transformers can be roughly divided into two types, one is a statistical model represented by ARIMA, prophet, etc., and the other is an autoregressive model represented by RNN. The methods usually carry out short-term prediction according to single or multiple variables, the prediction time is short, the precision is low, a large amount of high-dimensional data and complex time sequence relation in real application are difficult to process, and the methods are not suitable for practical application.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a power Transformer load prediction method based on an interactive multi-head attention Transformer model, which is based on an encoder-decoder framework of the Transformer model, realizes information interaction of different subspaces of the traditional multi-head attention by utilizing depth separable convolution, improves the data fitting capability of the model, and meanwhile, distills time sequence data by utilizing a maximum pooling layer, reduces the memory overhead in the model training process, and realizes accurate prediction of the power Transformer load.
A second object of the invention is to propose an application using the above prediction method.
A third object of the invention is to propose a device using the above prediction method.
The technical scheme for realizing the above purpose of the invention is as follows:
a method for predicting the load of a power Transformer based on a Transformer model comprises the following steps:
s1, collecting load data of a power transformer, and arranging the collected load data of the power transformer according to time to obtain a sequence sample data setx i Values representing observed variables at time i, L x Represents the length of the observed time series, d x Represents the number of observed variables;
normalizing the sequence of sample data sets to enable the sample data values to be in the range of [0,1], and obtaining a data set serving as a sample for supervised learning;
s2, dividing the data set subjected to normalization processing into a training set, a testing set and a verification set, and ensuring that the sampling period (the sampling interval time) of each data set can represent characteristic change samples in the same time period;
s3: defining and establishing an interactive multi-head attention Transformer-based model, and initializing network internal parameters and a learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises global time sequence coding and local time sequence coding, the global time sequence coding consists of year, month and week information in a data timestamp, and a local time sequence coding formula is as follows:
in the formula, PE represents position encoding, pos represents position, j represents dimension,
s4, the Transformer model consists of an encoder and a decoder, and in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: inputting the above vector with timing information into a multi-head attention layer to obtain an intermediate value:
wherein W Q ,W K ,W V Is a weight matrix, and Q, K and V are input vectors;
using the depth separable volume to realize information interaction on different subspaces;
wherein Conv1 and Conv2 respectively represent depth-wise Convolition and point-wise Convolition, and Elu represents an activation function;
then, a linear transformation layer is used for feature dimension conversion, and finally, downsampling is carried out through a pooling layer to obtain output:
s5: constructing a three-layer decoder by adopting a multi-head attention layer and a multi-head attention interaction layer; first using the features f from a multi-head attention interaction layer 1 And features f from residual concatenation 2 Calculating a weight ratioWhereinRepresenting a weight matrix, b g Indicating the bias and Sigmoid the activation function. Then based on the ratio, for the two features f 1 And f 2 Perform weighted summation
Fusion(f1,f2)=g⊙f 1 +(1-g)f 2
S6: a decoder is constructed using a multi-headed attention layer and a multi-headed attention interaction layer. The multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the contribution degree score and the Value matrix to obtain a feature vector. The multi-head attention interaction layer is responsible for performing subspace information interaction on the formed feature vectors, and finally the linear change layer outputs a final prediction sequence.
The data points in S1 are arranged in time, and the sampling can be performed at 1 hour intervals or 15min and 1min intervals, and the shorter the time interval, the finer the data. And S4, in the conventional multi-head attention mechanism, the features are divided into a plurality of blocks, and information interaction of different subspaces is not considered, so that the feature extraction capability of the model on time series data is limited. The invention improves the attention mechanism in the model; by convolution processing, the blocks are interrelated, and longer-time data can be predicted. On the basis of a multi-head attention mechanism, a multi-head attention interaction layer is introduced, and information interaction on different subspaces is realized by using depth separable convolution. The method reduces the memory overhead in the model training process. Features can be adaptively selected and redundant information filtered out.
The method comprises the steps of collecting data related to load of the power transformer by using a temperature measuring element, an ammeter, a voltmeter and a sensor, wherein the data comprises one or more of load, oil temperature, position, climate and demand.
Further, in the step S4:
Performing information interaction through a multi-head attention interaction layer, wherein the multi-head attention interaction layer consists of a depth separable convolution layer, a linear change layer and a maximum pooling layer; output tensor formed for multi-headed self-attentive mechanismFirstly, information aggregation is carried out on channel dimensionality by utilizing 1x1 Pointwise convolution; after an ELU activation function, performing information interaction on a spatial dimension by using a DepthWise convolution to simultaneously learn correlation on the space and correlation between channels; finally, the distillation operation on the time series is realized by using the largest pooling layer with the step size of 2. The operation is to reduce the length of the encoder to half in the time dimension after passing through each layer of the encoder, and to filterRedundant information is provided, thereby reducing memory consumption during training.
And S2, performing pretreatment on the data set according to the following steps of 7:2: the proportion of 1 is divided into a training set, a testing set and a verification set respectively, and each data set sampling period can represent characteristic change samples in the same time interval (the same time interval is the interval time of acquisition).
Further, in step S4:
the input part of the decoder is represented asWherein,the values of the next k time steps from the Encoder input,placeholders (filled with 0) as target sequences to be predicted; finally, the fully-connected layer is used to output a prediction value whose dimensionality depends on the number of variables that need to be predicted.
In the step S4, in the network convergence process, an average absolute error (MSE) loss function and an Adam algorithm with a random gradient decreasing are used.
According to the method, on one hand, the learning rate of each parameter is dynamically modified, and on the other hand, a momentum method is introduced, so that more opportunities exist for updating the parameters to jump out of local optimum, and network convergence is accelerated and optimized.
The training process is a process of inputting the model and iterating in the gradient descending process to reduce errors.
The method for predicting the power Transformer based on the Transformer model further comprises the following steps of S7: evaluating model overfitting, wherein EarlyStopping is used for preventing model overfitting in the training process; for the model after each round of training, verifying by using the verification set obtained in the step S2, and stopping training if the test error is found to rise on the verification set along with the increase of the training round; the weight after the stop is taken as the final parameter of the network.
The application of the power Transformer prediction method based on the Transformer model uses the model to predict: after the model evaluation and verification, the test set data obtained in the step S2 is input into the model verified in the step S7 to predict the future time value.
The method can be used for transformer load prediction in wind farms or other facilities with similar characteristics, preferably in wind farms.
The power Transformer load prediction model based on the interactive multi-head attention Transformer receives a historical load sequence as input, and predicts load values of a plurality of time steps in the future; by realizing information interaction among multi-head attention, the feature extraction capability of the model on long sequence data is improved, and therefore high-precision long-term prediction on the load of the power transformer is realized.
An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps when executing the program.
The invention has the beneficial effects that:
compared with the existing prediction method, the power Transformer load prediction method based on the interactive multi-head attention Transformer model has the advantages that: the traditional time sequence prediction method cannot accurately predict long sequence data, and the prediction method introduces interactive multi-head attention on the basis of a transform to enhance the characteristic extraction capability of a model on the sequence data, and simultaneously realizes the distillation operation on the sequence data by utilizing a maximum pooling layer in order to reduce the memory overhead in the model training process.
The power transformer load prediction method provided by the invention can better capture the dependency relationship between the long sequence data, thereby realizing accurate prediction of the power transformer load and having certain practicability in the construction of an intelligent power grid.
The prediction method utilizes the maximum pooling layer to distill the time sequence data, reduces the memory overhead in the model training process, and realizes accurate prediction of the load of the power transformer.
Drawings
FIG. 1 is a flow chart of the load prediction of a power Transformer based on an interactive multi-head attention Transformer model according to the present invention;
FIG. 2 is a model diagram of a power Transformer load prediction based on an interactive multi-head attention Transformer model according to the present invention;
fig. 3 shows the prediction effect of the prediction method IMAHN compared to real data.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Unless otherwise specified, all technical means used in the specification are technical means known in the art.
The invention is further described in detail below with reference to the accompanying drawings and embodiments, in which the invention provides a power Transformer load prediction method based on an interactive multi-head attention Transformer model.
The training data set used in the examples collected the load conditions of power transformers in two different areas of the same province in china from 2016 to 2018. Each data point was recorded once per minute (marked with m) and designated ETT-small-m1. The data set contained 2 years × 365 days × 24 hours × 4=70,080 data points. In addition, the data set also provides data set variant usage (marked with h) at one hour level granularity, namely ETT-small-h1 and ETT-small-h2. Each data point contains 8-dimensional features including the data point's recording date, the predicted value "oil temperature", and 6 different types of external load values, which are High useful load (High useful load), high ineffective load (High useful load), medium useful load (medium useful load), medium ineffective load (medium useful load), low effective load (low useful load), and low ineffective load (low useful load), respectively.
Example 1:
fig. 1 is a flowchart illustrating a power Transformer load prediction method based on an interactive multi-head attention Transformer model according to the present invention. The method specifically comprises the following steps:
s1, collecting negative pole of power transformerLoad data, arranging the collected load data of the power transformer according to time to obtain a sequence sample data setx i Values representing observed variables at time i, L x Representing the length of the observed time series, d x Represents the number of observed variables;
normalizing the sequence of sample data sets to enable the sample data values to be in the range of [0,1], and obtaining a data set serving as a sample for supervised learning;
s2, the normalized data set is processed according to the following steps of 7:2: the proportion of 1 is divided into a training set, a testing set and a verification set, and the sampling period of each data set can represent characteristic change samples in the same time period.
And ensuring that each data set sampling period can represent a characteristic change sample in the same time period;
s3: defining and establishing an interactive multi-head attention Transformer-based model, and initializing network internal parameters and a learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises global time sequence coding and local time sequence coding, the global time sequence coding consists of year, month and week information in a data timestamp, and a local time sequence coding formula is as follows:
in the formula, PE represents position encoding, pos represents position, j represents dimension,
s4, the Transformer model consists of an encoder and a decoder, and in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: inputting the above vector with timing information into a multi-head attention layer to obtain an intermediate value:
wherein W Q ,W K ,W V Is a weight matrix, and Q, K and V are input vectors;
information interaction on different subspaces is realized by using the depth separable volume;
wherein Conv1 and Conv2 respectively represent depth-wise Convolition and point-wise Convolition, and Elu represents activation function;
then, a linear transformation layer is used for feature dimension conversion, and finally, downsampling is carried out through a pooling layer to obtain output:
in step S4:
output vectors generated by a multi-headed attention layer
Information interaction is carried out through a multi-head attention interaction layer, and an interaction module consists of a depth separable convolution layer, a linear change layer and a maximum pooling layer; output tensor formed for multi-headed self-attentive mechanismInformation aggregation is performed on channel dimensions by using 1x1 PointWise convolution. After the ELU activation function, information interaction is carried out on the spatial dimension by using the DepthWise convolution, so that the correlation on the space and the correlation among channels can be learned simultaneously. Finally, the distillation operation on the time series is realized by using the largest pooling layer with the step size of 2. Wherein, the information interaction module consists of a depth separable convolution, a linear variation layer and a maximum pooling, and is used for forming an output tensor of the multi-head self-attention mechanismInformation aggregation is performed on channel dimensions by using 1x1 PointWise convolution. After an ELU activation function, performing information interaction on a spatial dimension by using a DepthWise convolution to simultaneously learn correlation on the space and correlation between channels; finally, the distillation operation on the time series is realized by using the largest pooling layer with the step size of 2.
In the step S4:
output vector O generated by multi-head attention layer i =Attention(QWi i Q ,KW i K ,VW i V ) And carrying out information interaction through the multi-head attention interaction layer.
The input part of the decoder is represented asWherein,the values of the next k time steps from the Encoder input,placeholders (filled with 0) as target sequences to be predicted; finally, the fully-connected layer is used to output a prediction value whose dimensionality depends on the number of variables that need to be predicted.
And 4, in the network convergence process, using an average absolute error (MSE) loss function and an Adam algorithm with a random gradient descending.
S5: constructing a three-layer decoder by adopting a multi-head attention layer and a multi-head attention interaction layer; first using the features f from a multi-head attention interaction layer 1 And features f from residual concatenation 2 Calculating a weight ratioWhereinRepresenting a weight matrix, b g Indicating the bias and Sigmoid the activation function. Then, based on the ratio, the above two features are subjected to a weighted sum Fusion (f 1, f 2) = g = f 1 +(1-g)f 2
S6: a decoder is constructed using a multi-head attention layer and a multi-head attention interaction layer. The multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the contribution degree score and the Value matrix to obtain a feature vector. The multi-head attention interaction layer is responsible for performing subspace information interaction on the formed feature vectors, and finally the linear change layer outputs a final prediction sequence.
S7: evaluating model overfitting, and using EarlyStopping to prevent model overfitting in the training process; for the model after each round of training, verifying by using the verification set obtained in the step S2, and stopping training if the test error is found to rise on the verification set along with the increase of the training round; the weight after the stop is taken as the final parameter of the network.
And after the model evaluation and verification, inputting the test set data obtained in the step 2 into the model verified in the step 5 to predict a future time value. FIG. 3 shows partial prediction results of the method on an ETT data set, and tables 1 and 2 show comparison results of the method in comparison with other prediction methods under univariate and multivariate conditions, respectively, and the effectiveness and the advancement of the model can be seen from the graph.
TABLE 1 univariate time series prediction results
In Table 1, IMHAN is the method proposed by the present invention, and Informmer, LSTMa, deepar, ARIMA, and Prophet are comparative methods.
MAE (mean absolute error), MSE (mean square error) is an evaluation index.
Example 2:
a Transformer model was obtained by the same power Transformer load prediction method as in example 1. In this embodiment, a plurality of variables are input for prediction, and the variables include load, oil temperature, location, climate, and demand. The data in the raw data set is obtained by means of temperature measuring elements, current and user side power measurement. The present embodiment predicts the load variable by multiple variables; the dimensions of the formula input are different from those of embodiment 1.
The results obtained by the Transformer model are shown in table 2:
TABLE 2 multivariate time series prediction results
In Table 2, IMAHN is the method proposed herein, and Informmer, LSTMa, and LSTnet are comparative prediction methods.
Example 3: applications of
And after the model evaluation and verification, inputting the test set data obtained in the step 2 into the model verified in the step 7 to predict a future time value, so as to guide the model selection and the setting of the transformer in the power grid.
For the transformer of the wind driven generator grid connection, the positions and climates in the multivariable are changed according to different wind power plant settings, and the prediction method is particularly suitable for load prediction of the wind power plant transformer.
Although the present invention has been described in the foregoing by way of examples, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
Claims (10)
1. A method for predicting the load of a power Transformer based on a Transformer model is characterized by comprising the following steps:
s1, collecting load data of a power transformer, and arranging the collected load data of the power transformer according to time to obtain a sequence sample data setx i Values representing observed variables at time i, L x Representing the length of the observed time series, d x Represents the number of observed variables;
normalizing the sequence of sample data sets to enable the sample data values to be in the range of [0,1], and obtaining a data set serving as a sample for supervised learning;
s2, dividing the data set subjected to normalization processing into a training set, a testing set and a verification set, and ensuring that each data set sampling cycle can represent feature change samples in the same time period;
s3: defining and establishing an interactive multi-head attention Transformer-based model, and initializing network internal parameters and a learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises global time sequence coding and local time sequence coding, the global time sequence coding consists of year, month and week information in a data timestamp, and a local time sequence coding formula is as follows:
in the formula, PE represents position encoding, pos represents position, j represents dimension,
s4, the Transformer model consists of an encoder and a decoder, and in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: the vector with timing information is input into the multi-head attention layer to obtain an intermediate value:
wherein W Q ,W K ,W V Is a weight matrix, and Q, K and V are input vectors;
using the depth separable volume to realize information interaction on different subspaces;
wherein Conv1 and Conv2 respectively represent depth-wise Convolition and point-wise Convolition, and Elu represents activation function;
then, a linear transformation layer is used for feature dimension conversion, and finally, downsampling is carried out through a pooling layer to obtain output:
s5: by using multiple attention levels and multiple notesConstructing a three-layer decoder by the idea interaction layer; first using the feature f from the multi-head attention interaction layer 1 And features f from residual concatenation 2 Calculating a weight ratioWhereinRepresenting a weight matrix, b g Indicating the bias and Sigmoid the activation function. Then, based on the ratio, for the above feature f 1 And f 2 Perform weighted summation Fusion (f 1, f 2) = g & 1 +(1-g)f 2
S6: a decoder is constructed using a multi-head attention layer and a multi-head attention interaction layer. The multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the obtained contribution degree score and the Value matrix to obtain a feature vector; the multi-head attention interaction layer is responsible for performing subspace information interaction on the formed feature vectors, and finally the linear change layer outputs a final prediction sequence.
2. The method for predicting load of power Transformer based on Transformer model according to claim 1, wherein temperature measuring elements, ammeters, voltmeters and sensors are used for collecting data of the power Transformer related to load, and the data comprise one or more of load, oil temperature, position, climate and demand.
3. The method for predicting the load of the power Transformer based on the Transformer model according to claim 1, wherein in the step S4:
Performing information interaction through a multi-head attention interaction layer, wherein the multi-head attention interaction layer consists of a depth separable convolution layer, a linear change layer and a maximum pooling layer; output tensor for multi-headed self-attention mechanism formationFirstly, information aggregation is carried out on channel dimensionality by utilizing 1x1 Pointwise convolution; after an ELU activation function, performing information interaction on a spatial dimension by using a DepthWise convolution to simultaneously learn correlation on the space and correlation between channels; finally, the distillation operation over the time series is achieved with the largest pooling layer of step size 2.
4. The method for predicting load of power Transformer based on Transformer model according to claim 1, wherein S2, the data set is expressed by the following steps of 7:2: the proportion of 1 is divided into a training set, a testing set and a verification set respectively, and the sampling period of each data set can represent characteristic change samples in the same time period.
5. The method for predicting the load of the power Transformer based on the Transformer model according to claim 1, wherein in the step S4:
the input part of the decoder is represented asWherein,the values of the next k time steps from the Encoder input,placeholders as target sequences to be predicted (filled with 0); finally, the fully-connected layer is used to output a prediction value whose dimensionality depends on the number of variables that need to be predicted.
6. The method for predicting the load of the power Transformer based on the Transformer model as claimed in claim 1, wherein an average absolute error (MSE) loss function and an Adam algorithm with a random gradient descent are used in the network convergence process of the step S4.
7. The method for predicting the load of the power Transformer based on the Transformer model according to any one of claims 1 to 6, further comprising the step of S7:
evaluating model overfitting, and using EarlyStopping to prevent model overfitting in the training process; for the model after each round of training, verifying by using the verification set obtained in the step S2, and stopping training if the test error is found to rise on the verification set along with the increase of the training round; the weight after the stop is taken as the final parameter of the network.
8. Use of a transform model based power Transformer load prediction method according to any of claims 1 to 7, characterized in that the model is used to predict: and after the model evaluation and verification, inputting the test set data obtained in the step S2 into the model verified in the step S7 to predict a future time value.
9. Use according to claim 8, characterized by transformer load prediction for a wind farm.
10. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps of any of claims 1 to 8 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211379043.6A CN115622047B (en) | 2022-11-04 | 2022-11-04 | Power Transformer load prediction method based on Transformer model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211379043.6A CN115622047B (en) | 2022-11-04 | 2022-11-04 | Power Transformer load prediction method based on Transformer model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115622047A true CN115622047A (en) | 2023-01-17 |
CN115622047B CN115622047B (en) | 2023-07-18 |
Family
ID=84877989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211379043.6A Active CN115622047B (en) | 2022-11-04 | 2022-11-04 | Power Transformer load prediction method based on Transformer model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115622047B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116070799A (en) * | 2023-03-30 | 2023-05-05 | 南京邮电大学 | Photovoltaic power generation amount prediction system and method based on attention and deep learning |
CN117034175A (en) * | 2023-10-07 | 2023-11-10 | 北京麟卓信息科技有限公司 | Time sequence data anomaly detection method based on channel fusion self-attention mechanism |
CN117292243A (en) * | 2023-11-24 | 2023-12-26 | 合肥工业大学 | Method, equipment and medium for predicting magnetocardiogram signal space-time image based on deep learning |
CN117435918A (en) * | 2023-12-20 | 2024-01-23 | 杭州市特种设备检测研究院(杭州市特种设备应急处置中心) | Elevator risk early warning method based on spatial attention network and feature division |
CN117851897A (en) * | 2024-03-08 | 2024-04-09 | 国网山西省电力公司晋城供电公司 | Multi-dimensional feature fusion oil immersed transformer online fault diagnosis method |
CN118332268A (en) * | 2024-06-14 | 2024-07-12 | 国网山东省电力公司滨州市沾化区供电公司 | Distributed power data processing method, system, electronic equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297885A (en) * | 2019-05-27 | 2019-10-01 | 中国科学院深圳先进技术研究院 | Generation method, device, equipment and the storage medium of real-time event abstract |
CN111080032A (en) * | 2019-12-30 | 2020-04-28 | 成都数之联科技有限公司 | Load prediction method based on Transformer structure |
CN112288595A (en) * | 2020-10-30 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Power grid load prediction method, related device, equipment and storage medium |
-
2022
- 2022-11-04 CN CN202211379043.6A patent/CN115622047B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297885A (en) * | 2019-05-27 | 2019-10-01 | 中国科学院深圳先进技术研究院 | Generation method, device, equipment and the storage medium of real-time event abstract |
CN111080032A (en) * | 2019-12-30 | 2020-04-28 | 成都数之联科技有限公司 | Load prediction method based on Transformer structure |
CN112288595A (en) * | 2020-10-30 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Power grid load prediction method, related device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
樊倩男等: "《基于Transformer的稳健电力负荷预测》", 《电力大数据》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116070799A (en) * | 2023-03-30 | 2023-05-05 | 南京邮电大学 | Photovoltaic power generation amount prediction system and method based on attention and deep learning |
CN116070799B (en) * | 2023-03-30 | 2023-05-30 | 南京邮电大学 | Photovoltaic power generation amount prediction system and method based on attention and deep learning |
CN117034175A (en) * | 2023-10-07 | 2023-11-10 | 北京麟卓信息科技有限公司 | Time sequence data anomaly detection method based on channel fusion self-attention mechanism |
CN117034175B (en) * | 2023-10-07 | 2023-12-05 | 北京麟卓信息科技有限公司 | Time sequence data anomaly detection method based on channel fusion self-attention mechanism |
CN117292243A (en) * | 2023-11-24 | 2023-12-26 | 合肥工业大学 | Method, equipment and medium for predicting magnetocardiogram signal space-time image based on deep learning |
CN117292243B (en) * | 2023-11-24 | 2024-02-20 | 合肥工业大学 | Method, equipment and medium for predicting magnetocardiogram signal space-time image based on deep learning |
CN117435918A (en) * | 2023-12-20 | 2024-01-23 | 杭州市特种设备检测研究院(杭州市特种设备应急处置中心) | Elevator risk early warning method based on spatial attention network and feature division |
CN117435918B (en) * | 2023-12-20 | 2024-03-15 | 杭州市特种设备检测研究院(杭州市特种设备应急处置中心) | Elevator risk early warning method based on spatial attention network and feature division |
CN117851897A (en) * | 2024-03-08 | 2024-04-09 | 国网山西省电力公司晋城供电公司 | Multi-dimensional feature fusion oil immersed transformer online fault diagnosis method |
CN118332268A (en) * | 2024-06-14 | 2024-07-12 | 国网山东省电力公司滨州市沾化区供电公司 | Distributed power data processing method, system, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN115622047B (en) | 2023-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115622047B (en) | Power Transformer load prediction method based on Transformer model | |
Tan et al. | Ultra-short-term industrial power demand forecasting using LSTM based hybrid ensemble learning | |
CN108711847B (en) | A kind of short-term wind power forecast method based on coding and decoding shot and long term memory network | |
CN108022001A (en) | Short term probability density Forecasting Methodology based on PCA and quantile estimate forest | |
CN112633604B (en) | Short-term power consumption prediction method based on I-LSTM | |
CN113554466B (en) | Short-term electricity consumption prediction model construction method, prediction method and device | |
CN107194495A (en) | A kind of longitudinal Forecasting Methodology of photovoltaic power excavated based on historical data | |
CN113379164B (en) | Load prediction method and system based on deep self-attention network | |
CN111553543A (en) | Power load prediction method based on TPA-Seq2Seq and related assembly | |
CN116258269A (en) | Ultra-short-term load dynamic prediction method based on load characteristic decomposition | |
CN112990597B (en) | Ultra-short-term prediction method for industrial park power consumption load | |
CN112001537B (en) | Short-term wind power prediction method based on gray model and support vector machine | |
CN113111592A (en) | Short-term wind power prediction method based on EMD-LSTM | |
CN115688993A (en) | Short-term power load prediction method suitable for power distribution station area | |
CN116703644A (en) | Attention-RNN-based short-term power load prediction method | |
CN117977587B (en) | Power load prediction system and method based on deep neural network | |
CN116885699A (en) | Power load prediction method based on dual-attention mechanism | |
CN110222910A (en) | A kind of active power distribution network Tendency Prediction method and forecasting system | |
CN117522626A (en) | Photovoltaic output prediction method based on feature selection and abnormal multi-model fusion | |
CN117060374A (en) | Day-ahead wind-solar power generation power scene generation method, virtual device and computer readable medium | |
CN117134315A (en) | Distribution transformer load prediction method and device based on BERT algorithm | |
CN116402194A (en) | Multi-time scale load prediction method based on hybrid neural network | |
Ming-guang et al. | Short-term load combined forecasting method based on BPNN and LS-SVM | |
CN112581311B (en) | Method and system for predicting long-term output fluctuation characteristics of aggregated multiple wind power plants | |
Kuang et al. | Short-term power load forecasting method in rural areas based on cnn-lstm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |