CN115622047A - Power Transformer load prediction method based on Transformer model - Google Patents

Power Transformer load prediction method based on Transformer model Download PDF

Info

Publication number
CN115622047A
CN115622047A CN202211379043.6A CN202211379043A CN115622047A CN 115622047 A CN115622047 A CN 115622047A CN 202211379043 A CN202211379043 A CN 202211379043A CN 115622047 A CN115622047 A CN 115622047A
Authority
CN
China
Prior art keywords
layer
load
model
transformer
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211379043.6A
Other languages
Chinese (zh)
Other versions
CN115622047B (en
Inventor
何霆
王屾
朱文龙
陈世茂
曾建华
杨子骥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhonghai Energy Storage Technology Beijing Co Ltd
Original Assignee
Zhonghai Energy Storage Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhonghai Energy Storage Technology Beijing Co Ltd filed Critical Zhonghai Energy Storage Technology Beijing Co Ltd
Priority to CN202211379043.6A priority Critical patent/CN115622047B/en
Publication of CN115622047A publication Critical patent/CN115622047A/en
Application granted granted Critical
Publication of CN115622047B publication Critical patent/CN115622047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Power Engineering (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a power Transformer load prediction method based on a Transformer model, which comprises the following steps: acquiring load data of a power transformer, and arranging the acquired load data of the power transformer according to time to obtain a sequence sample data set; dividing a data set into a training set, a testing set and a verification set, and ensuring that each data set sampling cycle can represent a characteristic change sample in the same time period; defining and establishing an interactive multi-head attention Transformer-based model, and initializing network internal parameters and a learning rate; a three-layer decoder is constructed using a multi-head attention layer and a multi-head attention interaction layer. The power transformer load prediction method provided by the invention can better capture the dependency relationship between the long sequence data, thereby realizing accurate prediction of the power transformer load and having certain practicability in the construction of an intelligent power grid.

Description

Power Transformer load prediction method based on Transformer model
Technical Field
The invention belongs to the technical field of power metering data processing, and particularly relates to a method for predicting the load of a power transformer.
Background
The smart grid realizes the reliable, safe, economic, efficient and environment-friendly operation of the power grid through advanced sensing and measuring technologies and advanced control systems. The power transformer is an important device in power grid construction, and accurate long-term prediction of loads of the power transformer is an important condition for constructing an intelligent power grid according to historical operation rule data information of the power transformer. The power transformer load prediction is characterized in that historical time sequence data are used as a data source, a power transformer load prediction mathematical model is established by using technologies such as data mining and deep learning, and the power transformer load is predicted according to the established model, so that reasonable power distribution is realized, and power waste is reduced.
With the continuous increase of installed capacity of wind power, technical and economic influences brought by wind power integration on a main power grid are larger and larger, and greater challenges are provided for transformer data processing. Because grid-connected operation of a wind power plant has negative influences on the power quality, the voltage stability, the power grid safety and other aspects of the power grid, the power quality and the voltage stability can be effectively improved only by accurately predicting the load of the power transformer. Therefore, how to reasonably estimate the load of the power transformer can effectively reduce unnecessary power waste and fully play the role of auxiliary decision making of the smart grid.
The power transformer has the characteristics of complex structure and nonlinear change of material parameters. During power distribution, the transformer can often only be adjusted relatively conservatively. In reality, it is difficult to predict the load of the power transformer, because it is influenced by various factors such as weather, temperature, season, environment, etc., and thus exhibits complicated variation characteristics. The currently proposed load prediction methods for power transformers can be roughly divided into two types, one is a statistical model represented by ARIMA, prophet, etc., and the other is an autoregressive model represented by RNN. The methods usually carry out short-term prediction according to single or multiple variables, the prediction time is short, the precision is low, a large amount of high-dimensional data and complex time sequence relation in real application are difficult to process, and the methods are not suitable for practical application.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a power Transformer load prediction method based on an interactive multi-head attention Transformer model, which is based on an encoder-decoder framework of the Transformer model, realizes information interaction of different subspaces of the traditional multi-head attention by utilizing depth separable convolution, improves the data fitting capability of the model, and meanwhile, distills time sequence data by utilizing a maximum pooling layer, reduces the memory overhead in the model training process, and realizes accurate prediction of the power Transformer load.
A second object of the invention is to propose an application using the above prediction method.
A third object of the invention is to propose a device using the above prediction method.
The technical scheme for realizing the above purpose of the invention is as follows:
a method for predicting the load of a power Transformer based on a Transformer model comprises the following steps:
s1, collecting load data of a power transformer, and arranging the collected load data of the power transformer according to time to obtain a sequence sample data set
Figure BDA0003927541100000021
x i Values representing observed variables at time i, L x Represents the length of the observed time series, d x Represents the number of observed variables;
normalizing the sequence of sample data sets to enable the sample data values to be in the range of [0,1], and obtaining a data set serving as a sample for supervised learning;
s2, dividing the data set subjected to normalization processing into a training set, a testing set and a verification set, and ensuring that the sampling period (the sampling interval time) of each data set can represent characteristic change samples in the same time period;
s3: defining and establishing an interactive multi-head attention Transformer-based model, and initializing network internal parameters and a learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises global time sequence coding and local time sequence coding, the global time sequence coding consists of year, month and week information in a data timestamp, and a local time sequence coding formula is as follows:
Figure BDA0003927541100000031
Figure BDA0003927541100000032
in the formula, PE represents position encoding, pos represents position, j represents dimension,
s4, the Transformer model consists of an encoder and a decoder, and in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: inputting the above vector with timing information into a multi-head attention layer to obtain an intermediate value:
Figure BDA0003927541100000033
wherein W Q ,W K ,W V Is a weight matrix, and Q, K and V are input vectors;
Figure BDA0003927541100000034
is composed of a plurality of parts, each part representing a subspace:
Figure BDA0003927541100000035
using the depth separable volume to realize information interaction on different subspaces;
Figure BDA0003927541100000036
wherein Conv1 and Conv2 respectively represent depth-wise Convolition and point-wise Convolition, and Elu represents an activation function;
then, a linear transformation layer is used for feature dimension conversion, and finally, downsampling is carried out through a pooling layer to obtain output:
Figure BDA0003927541100000037
s5: constructing a three-layer decoder by adopting a multi-head attention layer and a multi-head attention interaction layer; first using the features f from a multi-head attention interaction layer 1 And features f from residual concatenation 2 Calculating a weight ratio
Figure BDA0003927541100000038
Wherein
Figure BDA0003927541100000039
Representing a weight matrix, b g Indicating the bias and Sigmoid the activation function. Then based on the ratio, for the two features f 1 And f 2 Perform weighted summation
Fusion(f1,f2)=g⊙f 1 +(1-g)f 2
S6: a decoder is constructed using a multi-headed attention layer and a multi-headed attention interaction layer. The multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the contribution degree score and the Value matrix to obtain a feature vector. The multi-head attention interaction layer is responsible for performing subspace information interaction on the formed feature vectors, and finally the linear change layer outputs a final prediction sequence.
The data points in S1 are arranged in time, and the sampling can be performed at 1 hour intervals or 15min and 1min intervals, and the shorter the time interval, the finer the data. And S4, in the conventional multi-head attention mechanism, the features are divided into a plurality of blocks, and information interaction of different subspaces is not considered, so that the feature extraction capability of the model on time series data is limited. The invention improves the attention mechanism in the model; by convolution processing, the blocks are interrelated, and longer-time data can be predicted. On the basis of a multi-head attention mechanism, a multi-head attention interaction layer is introduced, and information interaction on different subspaces is realized by using depth separable convolution. The method reduces the memory overhead in the model training process. Features can be adaptively selected and redundant information filtered out.
The method comprises the steps of collecting data related to load of the power transformer by using a temperature measuring element, an ammeter, a voltmeter and a sensor, wherein the data comprises one or more of load, oil temperature, position, climate and demand.
Further, in the step S4:
output vectors generated by multi-headed attention layers
Figure BDA0003927541100000041
Performing information interaction through a multi-head attention interaction layer, wherein the multi-head attention interaction layer consists of a depth separable convolution layer, a linear change layer and a maximum pooling layer; output tensor formed for multi-headed self-attentive mechanism
Figure BDA0003927541100000042
Firstly, information aggregation is carried out on channel dimensionality by utilizing 1x1 Pointwise convolution; after an ELU activation function, performing information interaction on a spatial dimension by using a DepthWise convolution to simultaneously learn correlation on the space and correlation between channels; finally, the distillation operation on the time series is realized by using the largest pooling layer with the step size of 2. The operation is to reduce the length of the encoder to half in the time dimension after passing through each layer of the encoder, and to filterRedundant information is provided, thereby reducing memory consumption during training.
And S2, performing pretreatment on the data set according to the following steps of 7:2: the proportion of 1 is divided into a training set, a testing set and a verification set respectively, and each data set sampling period can represent characteristic change samples in the same time interval (the same time interval is the interval time of acquisition).
Further, in step S4:
the input part of the decoder is represented as
Figure BDA0003927541100000051
Wherein,
Figure BDA0003927541100000052
the values of the next k time steps from the Encoder input,
Figure BDA0003927541100000053
placeholders (filled with 0) as target sequences to be predicted; finally, the fully-connected layer is used to output a prediction value whose dimensionality depends on the number of variables that need to be predicted.
In the step S4, in the network convergence process, an average absolute error (MSE) loss function and an Adam algorithm with a random gradient decreasing are used.
According to the method, on one hand, the learning rate of each parameter is dynamically modified, and on the other hand, a momentum method is introduced, so that more opportunities exist for updating the parameters to jump out of local optimum, and network convergence is accelerated and optimized.
The training process is a process of inputting the model and iterating in the gradient descending process to reduce errors.
The method for predicting the power Transformer based on the Transformer model further comprises the following steps of S7: evaluating model overfitting, wherein EarlyStopping is used for preventing model overfitting in the training process; for the model after each round of training, verifying by using the verification set obtained in the step S2, and stopping training if the test error is found to rise on the verification set along with the increase of the training round; the weight after the stop is taken as the final parameter of the network.
The application of the power Transformer prediction method based on the Transformer model uses the model to predict: after the model evaluation and verification, the test set data obtained in the step S2 is input into the model verified in the step S7 to predict the future time value.
The method can be used for transformer load prediction in wind farms or other facilities with similar characteristics, preferably in wind farms.
The power Transformer load prediction model based on the interactive multi-head attention Transformer receives a historical load sequence as input, and predicts load values of a plurality of time steps in the future; by realizing information interaction among multi-head attention, the feature extraction capability of the model on long sequence data is improved, and therefore high-precision long-term prediction on the load of the power transformer is realized.
An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps when executing the program.
The invention has the beneficial effects that:
compared with the existing prediction method, the power Transformer load prediction method based on the interactive multi-head attention Transformer model has the advantages that: the traditional time sequence prediction method cannot accurately predict long sequence data, and the prediction method introduces interactive multi-head attention on the basis of a transform to enhance the characteristic extraction capability of a model on the sequence data, and simultaneously realizes the distillation operation on the sequence data by utilizing a maximum pooling layer in order to reduce the memory overhead in the model training process.
The power transformer load prediction method provided by the invention can better capture the dependency relationship between the long sequence data, thereby realizing accurate prediction of the power transformer load and having certain practicability in the construction of an intelligent power grid.
The prediction method utilizes the maximum pooling layer to distill the time sequence data, reduces the memory overhead in the model training process, and realizes accurate prediction of the load of the power transformer.
Drawings
FIG. 1 is a flow chart of the load prediction of a power Transformer based on an interactive multi-head attention Transformer model according to the present invention;
FIG. 2 is a model diagram of a power Transformer load prediction based on an interactive multi-head attention Transformer model according to the present invention;
fig. 3 shows the prediction effect of the prediction method IMAHN compared to real data.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Unless otherwise specified, all technical means used in the specification are technical means known in the art.
The invention is further described in detail below with reference to the accompanying drawings and embodiments, in which the invention provides a power Transformer load prediction method based on an interactive multi-head attention Transformer model.
The training data set used in the examples collected the load conditions of power transformers in two different areas of the same province in china from 2016 to 2018. Each data point was recorded once per minute (marked with m) and designated ETT-small-m1. The data set contained 2 years × 365 days × 24 hours × 4=70,080 data points. In addition, the data set also provides data set variant usage (marked with h) at one hour level granularity, namely ETT-small-h1 and ETT-small-h2. Each data point contains 8-dimensional features including the data point's recording date, the predicted value "oil temperature", and 6 different types of external load values, which are High useful load (High useful load), high ineffective load (High useful load), medium useful load (medium useful load), medium ineffective load (medium useful load), low effective load (low useful load), and low ineffective load (low useful load), respectively.
Example 1:
fig. 1 is a flowchart illustrating a power Transformer load prediction method based on an interactive multi-head attention Transformer model according to the present invention. The method specifically comprises the following steps:
s1, collecting negative pole of power transformerLoad data, arranging the collected load data of the power transformer according to time to obtain a sequence sample data set
Figure BDA0003927541100000071
x i Values representing observed variables at time i, L x Representing the length of the observed time series, d x Represents the number of observed variables;
normalizing the sequence of sample data sets to enable the sample data values to be in the range of [0,1], and obtaining a data set serving as a sample for supervised learning;
s2, the normalized data set is processed according to the following steps of 7:2: the proportion of 1 is divided into a training set, a testing set and a verification set, and the sampling period of each data set can represent characteristic change samples in the same time period.
And ensuring that each data set sampling period can represent a characteristic change sample in the same time period;
s3: defining and establishing an interactive multi-head attention Transformer-based model, and initializing network internal parameters and a learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises global time sequence coding and local time sequence coding, the global time sequence coding consists of year, month and week information in a data timestamp, and a local time sequence coding formula is as follows:
Figure BDA0003927541100000081
Figure BDA0003927541100000082
in the formula, PE represents position encoding, pos represents position, j represents dimension,
s4, the Transformer model consists of an encoder and a decoder, and in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: inputting the above vector with timing information into a multi-head attention layer to obtain an intermediate value:
Figure BDA0003927541100000083
wherein W Q ,W K ,W V Is a weight matrix, and Q, K and V are input vectors;
Figure BDA0003927541100000084
is composed of a plurality of parts, each part representing a subspace:
Figure BDA0003927541100000085
information interaction on different subspaces is realized by using the depth separable volume;
Figure BDA0003927541100000086
wherein Conv1 and Conv2 respectively represent depth-wise Convolition and point-wise Convolition, and Elu represents activation function;
then, a linear transformation layer is used for feature dimension conversion, and finally, downsampling is carried out through a pooling layer to obtain output:
Figure BDA0003927541100000087
in step S4:
output vectors generated by a multi-headed attention layer
Figure BDA0003927541100000088
Information interaction is carried out through a multi-head attention interaction layer, and an interaction module consists of a depth separable convolution layer, a linear change layer and a maximum pooling layer; output tensor formed for multi-headed self-attentive mechanism
Figure BDA0003927541100000091
Information aggregation is performed on channel dimensions by using 1x1 PointWise convolution. After the ELU activation function, information interaction is carried out on the spatial dimension by using the DepthWise convolution, so that the correlation on the space and the correlation among channels can be learned simultaneously. Finally, the distillation operation on the time series is realized by using the largest pooling layer with the step size of 2. Wherein, the information interaction module consists of a depth separable convolution, a linear variation layer and a maximum pooling, and is used for forming an output tensor of the multi-head self-attention mechanism
Figure BDA0003927541100000097
Information aggregation is performed on channel dimensions by using 1x1 PointWise convolution. After an ELU activation function, performing information interaction on a spatial dimension by using a DepthWise convolution to simultaneously learn correlation on the space and correlation between channels; finally, the distillation operation on the time series is realized by using the largest pooling layer with the step size of 2.
In the step S4:
output vector O generated by multi-head attention layer i =Attention(QWi i Q ,KW i K ,VW i V ) And carrying out information interaction through the multi-head attention interaction layer.
The input part of the decoder is represented as
Figure BDA0003927541100000092
Wherein,
Figure BDA0003927541100000093
the values of the next k time steps from the Encoder input,
Figure BDA0003927541100000094
placeholders (filled with 0) as target sequences to be predicted; finally, the fully-connected layer is used to output a prediction value whose dimensionality depends on the number of variables that need to be predicted.
And 4, in the network convergence process, using an average absolute error (MSE) loss function and an Adam algorithm with a random gradient descending.
S5: constructing a three-layer decoder by adopting a multi-head attention layer and a multi-head attention interaction layer; first using the features f from a multi-head attention interaction layer 1 And features f from residual concatenation 2 Calculating a weight ratio
Figure BDA0003927541100000095
Wherein
Figure BDA0003927541100000096
Representing a weight matrix, b g Indicating the bias and Sigmoid the activation function. Then, based on the ratio, the above two features are subjected to a weighted sum Fusion (f 1, f 2) = g = f 1 +(1-g)f 2
S6: a decoder is constructed using a multi-head attention layer and a multi-head attention interaction layer. The multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the contribution degree score and the Value matrix to obtain a feature vector. The multi-head attention interaction layer is responsible for performing subspace information interaction on the formed feature vectors, and finally the linear change layer outputs a final prediction sequence.
S7: evaluating model overfitting, and using EarlyStopping to prevent model overfitting in the training process; for the model after each round of training, verifying by using the verification set obtained in the step S2, and stopping training if the test error is found to rise on the verification set along with the increase of the training round; the weight after the stop is taken as the final parameter of the network.
And after the model evaluation and verification, inputting the test set data obtained in the step 2 into the model verified in the step 5 to predict a future time value. FIG. 3 shows partial prediction results of the method on an ETT data set, and tables 1 and 2 show comparison results of the method in comparison with other prediction methods under univariate and multivariate conditions, respectively, and the effectiveness and the advancement of the model can be seen from the graph.
TABLE 1 univariate time series prediction results
Figure BDA0003927541100000101
In Table 1, IMHAN is the method proposed by the present invention, and Informmer, LSTMa, deepar, ARIMA, and Prophet are comparative methods.
MAE (mean absolute error), MSE (mean square error) is an evaluation index.
Example 2:
a Transformer model was obtained by the same power Transformer load prediction method as in example 1. In this embodiment, a plurality of variables are input for prediction, and the variables include load, oil temperature, location, climate, and demand. The data in the raw data set is obtained by means of temperature measuring elements, current and user side power measurement. The present embodiment predicts the load variable by multiple variables; the dimensions of the formula input are different from those of embodiment 1.
The results obtained by the Transformer model are shown in table 2:
TABLE 2 multivariate time series prediction results
Figure BDA0003927541100000111
In Table 2, IMAHN is the method proposed herein, and Informmer, LSTMa, and LSTnet are comparative prediction methods.
Example 3: applications of
And after the model evaluation and verification, inputting the test set data obtained in the step 2 into the model verified in the step 7 to predict a future time value, so as to guide the model selection and the setting of the transformer in the power grid.
For the transformer of the wind driven generator grid connection, the positions and climates in the multivariable are changed according to different wind power plant settings, and the prediction method is particularly suitable for load prediction of the wind power plant transformer.
Although the present invention has been described in the foregoing by way of examples, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A method for predicting the load of a power Transformer based on a Transformer model is characterized by comprising the following steps:
s1, collecting load data of a power transformer, and arranging the collected load data of the power transformer according to time to obtain a sequence sample data set
Figure FDA0003927541090000011
x i Values representing observed variables at time i, L x Representing the length of the observed time series, d x Represents the number of observed variables;
normalizing the sequence of sample data sets to enable the sample data values to be in the range of [0,1], and obtaining a data set serving as a sample for supervised learning;
s2, dividing the data set subjected to normalization processing into a training set, a testing set and a verification set, and ensuring that each data set sampling cycle can represent feature change samples in the same time period;
s3: defining and establishing an interactive multi-head attention Transformer-based model, and initializing network internal parameters and a learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises global time sequence coding and local time sequence coding, the global time sequence coding consists of year, month and week information in a data timestamp, and a local time sequence coding formula is as follows:
Figure FDA0003927541090000012
Figure FDA0003927541090000013
in the formula, PE represents position encoding, pos represents position, j represents dimension,
s4, the Transformer model consists of an encoder and a decoder, and in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: the vector with timing information is input into the multi-head attention layer to obtain an intermediate value:
Figure FDA0003927541090000014
wherein W Q ,W K ,W V Is a weight matrix, and Q, K and V are input vectors;
Figure FDA0003927541090000015
is composed of a plurality of parts, each part representing a subspace:
Figure FDA0003927541090000016
using the depth separable volume to realize information interaction on different subspaces;
Figure FDA0003927541090000021
wherein Conv1 and Conv2 respectively represent depth-wise Convolition and point-wise Convolition, and Elu represents activation function;
then, a linear transformation layer is used for feature dimension conversion, and finally, downsampling is carried out through a pooling layer to obtain output:
Figure FDA0003927541090000022
s5: by using multiple attention levels and multiple notesConstructing a three-layer decoder by the idea interaction layer; first using the feature f from the multi-head attention interaction layer 1 And features f from residual concatenation 2 Calculating a weight ratio
Figure FDA0003927541090000023
Wherein
Figure FDA0003927541090000024
Representing a weight matrix, b g Indicating the bias and Sigmoid the activation function. Then, based on the ratio, for the above feature f 1 And f 2 Perform weighted summation Fusion (f 1, f 2) = g & 1 +(1-g)f 2
S6: a decoder is constructed using a multi-head attention layer and a multi-head attention interaction layer. The multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the obtained contribution degree score and the Value matrix to obtain a feature vector; the multi-head attention interaction layer is responsible for performing subspace information interaction on the formed feature vectors, and finally the linear change layer outputs a final prediction sequence.
2. The method for predicting load of power Transformer based on Transformer model according to claim 1, wherein temperature measuring elements, ammeters, voltmeters and sensors are used for collecting data of the power Transformer related to load, and the data comprise one or more of load, oil temperature, position, climate and demand.
3. The method for predicting the load of the power Transformer based on the Transformer model according to claim 1, wherein in the step S4:
output vectors generated by multi-headed attention layers
Figure FDA0003927541090000025
Performing information interaction through a multi-head attention interaction layer, wherein the multi-head attention interaction layer consists of a depth separable convolution layer, a linear change layer and a maximum pooling layer; output tensor for multi-headed self-attention mechanism formation
Figure FDA0003927541090000031
Firstly, information aggregation is carried out on channel dimensionality by utilizing 1x1 Pointwise convolution; after an ELU activation function, performing information interaction on a spatial dimension by using a DepthWise convolution to simultaneously learn correlation on the space and correlation between channels; finally, the distillation operation over the time series is achieved with the largest pooling layer of step size 2.
4. The method for predicting load of power Transformer based on Transformer model according to claim 1, wherein S2, the data set is expressed by the following steps of 7:2: the proportion of 1 is divided into a training set, a testing set and a verification set respectively, and the sampling period of each data set can represent characteristic change samples in the same time period.
5. The method for predicting the load of the power Transformer based on the Transformer model according to claim 1, wherein in the step S4:
the input part of the decoder is represented as
Figure FDA0003927541090000032
Wherein,
Figure FDA0003927541090000033
the values of the next k time steps from the Encoder input,
Figure FDA0003927541090000034
placeholders as target sequences to be predicted (filled with 0); finally, the fully-connected layer is used to output a prediction value whose dimensionality depends on the number of variables that need to be predicted.
6. The method for predicting the load of the power Transformer based on the Transformer model as claimed in claim 1, wherein an average absolute error (MSE) loss function and an Adam algorithm with a random gradient descent are used in the network convergence process of the step S4.
7. The method for predicting the load of the power Transformer based on the Transformer model according to any one of claims 1 to 6, further comprising the step of S7:
evaluating model overfitting, and using EarlyStopping to prevent model overfitting in the training process; for the model after each round of training, verifying by using the verification set obtained in the step S2, and stopping training if the test error is found to rise on the verification set along with the increase of the training round; the weight after the stop is taken as the final parameter of the network.
8. Use of a transform model based power Transformer load prediction method according to any of claims 1 to 7, characterized in that the model is used to predict: and after the model evaluation and verification, inputting the test set data obtained in the step S2 into the model verified in the step S7 to predict a future time value.
9. Use according to claim 8, characterized by transformer load prediction for a wind farm.
10. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps of any of claims 1 to 8 when executing the program.
CN202211379043.6A 2022-11-04 2022-11-04 Power Transformer load prediction method based on Transformer model Active CN115622047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211379043.6A CN115622047B (en) 2022-11-04 2022-11-04 Power Transformer load prediction method based on Transformer model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211379043.6A CN115622047B (en) 2022-11-04 2022-11-04 Power Transformer load prediction method based on Transformer model

Publications (2)

Publication Number Publication Date
CN115622047A true CN115622047A (en) 2023-01-17
CN115622047B CN115622047B (en) 2023-07-18

Family

ID=84877989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211379043.6A Active CN115622047B (en) 2022-11-04 2022-11-04 Power Transformer load prediction method based on Transformer model

Country Status (1)

Country Link
CN (1) CN115622047B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116070799A (en) * 2023-03-30 2023-05-05 南京邮电大学 Photovoltaic power generation amount prediction system and method based on attention and deep learning
CN117034175A (en) * 2023-10-07 2023-11-10 北京麟卓信息科技有限公司 Time sequence data anomaly detection method based on channel fusion self-attention mechanism
CN117292243A (en) * 2023-11-24 2023-12-26 合肥工业大学 Method, equipment and medium for predicting magnetocardiogram signal space-time image based on deep learning
CN117435918A (en) * 2023-12-20 2024-01-23 杭州市特种设备检测研究院(杭州市特种设备应急处置中心) Elevator risk early warning method based on spatial attention network and feature division
CN117851897A (en) * 2024-03-08 2024-04-09 国网山西省电力公司晋城供电公司 Multi-dimensional feature fusion oil immersed transformer online fault diagnosis method
CN118332268A (en) * 2024-06-14 2024-07-12 国网山东省电力公司滨州市沾化区供电公司 Distributed power data processing method, system, electronic equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297885A (en) * 2019-05-27 2019-10-01 中国科学院深圳先进技术研究院 Generation method, device, equipment and the storage medium of real-time event abstract
CN111080032A (en) * 2019-12-30 2020-04-28 成都数之联科技有限公司 Load prediction method based on Transformer structure
CN112288595A (en) * 2020-10-30 2021-01-29 腾讯科技(深圳)有限公司 Power grid load prediction method, related device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297885A (en) * 2019-05-27 2019-10-01 中国科学院深圳先进技术研究院 Generation method, device, equipment and the storage medium of real-time event abstract
CN111080032A (en) * 2019-12-30 2020-04-28 成都数之联科技有限公司 Load prediction method based on Transformer structure
CN112288595A (en) * 2020-10-30 2021-01-29 腾讯科技(深圳)有限公司 Power grid load prediction method, related device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
樊倩男等: "《基于Transformer的稳健电力负荷预测》", 《电力大数据》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116070799A (en) * 2023-03-30 2023-05-05 南京邮电大学 Photovoltaic power generation amount prediction system and method based on attention and deep learning
CN116070799B (en) * 2023-03-30 2023-05-30 南京邮电大学 Photovoltaic power generation amount prediction system and method based on attention and deep learning
CN117034175A (en) * 2023-10-07 2023-11-10 北京麟卓信息科技有限公司 Time sequence data anomaly detection method based on channel fusion self-attention mechanism
CN117034175B (en) * 2023-10-07 2023-12-05 北京麟卓信息科技有限公司 Time sequence data anomaly detection method based on channel fusion self-attention mechanism
CN117292243A (en) * 2023-11-24 2023-12-26 合肥工业大学 Method, equipment and medium for predicting magnetocardiogram signal space-time image based on deep learning
CN117292243B (en) * 2023-11-24 2024-02-20 合肥工业大学 Method, equipment and medium for predicting magnetocardiogram signal space-time image based on deep learning
CN117435918A (en) * 2023-12-20 2024-01-23 杭州市特种设备检测研究院(杭州市特种设备应急处置中心) Elevator risk early warning method based on spatial attention network and feature division
CN117435918B (en) * 2023-12-20 2024-03-15 杭州市特种设备检测研究院(杭州市特种设备应急处置中心) Elevator risk early warning method based on spatial attention network and feature division
CN117851897A (en) * 2024-03-08 2024-04-09 国网山西省电力公司晋城供电公司 Multi-dimensional feature fusion oil immersed transformer online fault diagnosis method
CN118332268A (en) * 2024-06-14 2024-07-12 国网山东省电力公司滨州市沾化区供电公司 Distributed power data processing method, system, electronic equipment and medium

Also Published As

Publication number Publication date
CN115622047B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN115622047B (en) Power Transformer load prediction method based on Transformer model
Tan et al. Ultra-short-term industrial power demand forecasting using LSTM based hybrid ensemble learning
CN108711847B (en) A kind of short-term wind power forecast method based on coding and decoding shot and long term memory network
CN108022001A (en) Short term probability density Forecasting Methodology based on PCA and quantile estimate forest
CN112633604B (en) Short-term power consumption prediction method based on I-LSTM
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN107194495A (en) A kind of longitudinal Forecasting Methodology of photovoltaic power excavated based on historical data
CN113379164B (en) Load prediction method and system based on deep self-attention network
CN111553543A (en) Power load prediction method based on TPA-Seq2Seq and related assembly
CN116258269A (en) Ultra-short-term load dynamic prediction method based on load characteristic decomposition
CN112990597B (en) Ultra-short-term prediction method for industrial park power consumption load
CN112001537B (en) Short-term wind power prediction method based on gray model and support vector machine
CN113111592A (en) Short-term wind power prediction method based on EMD-LSTM
CN115688993A (en) Short-term power load prediction method suitable for power distribution station area
CN116703644A (en) Attention-RNN-based short-term power load prediction method
CN117977587B (en) Power load prediction system and method based on deep neural network
CN116885699A (en) Power load prediction method based on dual-attention mechanism
CN110222910A (en) A kind of active power distribution network Tendency Prediction method and forecasting system
CN117522626A (en) Photovoltaic output prediction method based on feature selection and abnormal multi-model fusion
CN117060374A (en) Day-ahead wind-solar power generation power scene generation method, virtual device and computer readable medium
CN117134315A (en) Distribution transformer load prediction method and device based on BERT algorithm
CN116402194A (en) Multi-time scale load prediction method based on hybrid neural network
Ming-guang et al. Short-term load combined forecasting method based on BPNN and LS-SVM
CN112581311B (en) Method and system for predicting long-term output fluctuation characteristics of aggregated multiple wind power plants
Kuang et al. Short-term power load forecasting method in rural areas based on cnn-lstm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant