CN114169416B

CN114169416B - Short-term load prediction method based on migration learning under small sample set

Info

Publication number: CN114169416B
Application number: CN202111442332.1A
Authority: CN
Inventors: 张真源; 赵鹏飞; 黄琦; 胡维昊; 易建波; 李坚; 井实; 唐啸天
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2023-04-21
Anticipated expiration: 2041-11-30
Also published as: CN114169416A

Abstract

The invention discloses a short-term load prediction method under a small sample set based on transfer learning, which comprises the steps of firstly collecting historical load data and corresponding temperatures of a plurality of source domain users, so as to construct a plurality of input features; and training a plurality of depth residual error network models by using input features, carrying out adaptive migration combination on the depth residual error network models by using a Bayesian weighted probability averaging method, and carrying out real-time load prediction of a target user after the migration combination is completed.

Description

Short-term load prediction method based on migration learning under small sample set

Technical Field

The invention belongs to the technical field of power load prediction, and particularly relates to a short-term load prediction method under a small sample set based on transfer learning.

Background

Load prediction plays an extremely important role in the operation and control of modern power systems. However, with the large-scale renewable energy power generation grid connection, the popularization of electric automobiles and the increasing diversification of power consumption modes, the complexity and uncertainty of modern power systems are increasing. This presents additional challenges for the management of the power system. In response to the above problems, accurate residential load prediction can reduce the operating cost and promote intelligent operation of the power grid. For example, if we can get accurate and reliable load predictions for individual users, the detrimental effects of peak-to-valley usage can be curtailed by projects such as energy storage management, demand response, etc.

In order to achieve accurate and reliable load prediction, many methods of machine learning and deep learning are currently emerging. The machine learning method comprises the following steps: support vector machine regression (SVR), decision tree; the deep learning method comprises the following steps: deep res net, long-short-term memory neural network (LSTM). However, machine learning and deep learning models have two distinct drawbacks: firstly, a large amount of historical data is needed to train model parameters; and secondly, as a parameter model, the uncertainty of load prediction cannot be quantified.

However, resident side electricity usage is more irregular and more sensitive to consumer behavior than high voltage side. Meanwhile, in the power system, the lack of tagged history data is a very common problem. This results in the difficulty of the machine learning and deep learning methods described above to accomplish the tasks of deterministic prediction and uncertainty prediction in a small sample historical load data scenario. Therefore, there is a need for a method to accomplish load prediction tasks in the context of limited historical load data for residential users.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a short-term load prediction method under a small sample set based on transfer learning, which realizes accurate prediction of short-term load of a target user through the correlation of a source domain user and the target user in time and space under the condition of limited historical load data.

In order to achieve the above object, the invention provides a short-term load prediction method under a small sample set based on transfer learning, which is characterized by comprising the following steps:

(1) Data acquisition and pretreatment;

(1.1) setting a load sampling period T;

(1.2) historical load data x of M source domain users according to the load sampling period T _load Corresponding temperature x _temp Collecting to construct a data set and a temperature set, and recording an ith source domain user constructed data set as

And the temperature set is

wherein ,/>

Respectively representing historical load data and corresponding temperatures acquired by an ith source domain user at a t sampling moment, wherein i=1, 2, … and M;

(1.3) removing

and />

Abnormal value in (1), then performing linear interpolation to obtain data sample +.>

Finally, the data sample X ⁱ Normalization processing is carried out to obtain normalized data sample +.>

(1.4) as dataSample of

Add time characterization variable +.>

Comprises time sequence variable, day-level variable, holiday variable,>

taking the form of single thermal coding as input features, finally constructing M input features as +.>

(2) Building a depth residual error network model;

the depth residual network is formed by jumping and connecting L residual blocks, wherein each residual block consists of a convolution layer, a normalization layer and a Relu activation function layer;

(3) Training a depth residual network model based on source domain user data;

(3.1) from the ith input feature

Is selected randomly->

The data at each moment are used as training data for one round, < >>

Then sequentially inputting the data at each moment into a depth residual error network model, converting the frame data into tensor form through an input layer, and inputting the tensor form into serial residual error blocks;

(3.2) in the depth residual network model, the input tensor of the first residual block is set as Z ^(l-1) In the left branch of the first residual block, tensor Z ^(l-1) Feature extraction is carried out through a convolution kernel formed by a plurality of expansion causal convolutions, and then a convolution layer and a normalization layer are sequentially carried outA Relu activation function layer, a convolution layer and a normalization layer to obtain the output tensor of the left branch

In the right branch of the l-th residual block, tensor Z ^(l-1) Convolving 1 x 1 to make its output tensor +.>

Matching the dimension of the output tensor of the left branch, and then adding the output tensors of the two branches to obtain the output of the first residual block

Output Z of the first residual block ^(l) And the output of the (l-2) th residual block are added to obtain the input (Z) of the (l+1) th residual block ^l +Z ^(l-2) )；

(3.3) repeating step (3.2) until the last residual block outputs Z ^(L) Finally Z is ^(L) Outputs of two full connection layers connected in parallel, the outputs of which are recorded as

And is used as a predicted value at the time t;

(3.4) after the training of the training data of the present round is completed, calculating a loss function value MAPE of the training of the present round:

wherein ,

an observation value at time t;

(3.5) setting a loss threshold delta; calculating the difference delta MAPE between the loss function value after the current round of training and the loss function value after the previous round of training, comparing the delta MAPE with delta, and if delta MAPE is less than or equal to delta, finishing training to obtain an ith depth residual error network model; otherwise, updating the weight in the depth residual error network by using a batch gradient descent algorithm, and returning to the step (3.1) to perform the next training;

(3.6) training the depth residual error network according to the M input features in the steps (3.1) - (3.5) to finally obtain M depth residual error network models which are marked as { F } ₁ ,F ₂ ,…,F _i ,…,F _M }；

(4) Performing adaptive combination on the depth residual error network model by using a Bayes weighted probability averaging method;

(4.1) setting the acquisition period T of the target user ₁ ；

(4.2) according to the sampling period T ₁ Sampling the small sample historical load data of the target user to obtain a load data set

And a temperature dataset

wherein />

and />

Respectively representing the historical load data and the corresponding temperature acquired by the target user at the t sampling time, and then constructing input features according to the steps (1.3) - (1.4), and marking as +.>

(4.3) constructing input feature pairs

wherein ,/>

Input features representing time t

Load observation values at time t are represented;

(4.4), input features

Respectively inputting into M depth residual error network models to obtain prediction output

wherein ,/>

Representing a predicted value of an ith depth residual error network model at a time t;

(4.5), calculation

Is>

Wherein N represents a Gaussian distribution,

represents gaussian noise, ω= { ω ₁ ，ω ₂ ,...,ω _M The indication is given to

Different weights;

(4.6), calculation

The probability of (2) is as follows:

wherein I represents an identity matrix;

(4.7) assuming that the a priori obeying mean of the weights ω is 0 and the variance is Σ _p Is a gaussian distribution of (c):

ω＝N(0,Σ _p )

calculating posterior probability distribution of the weight omega according to Bayesian inference theory:

wherein ,

(5) Predicting the real-time load;

(5.1) acquiring historical load data and temperature data of a target user in real time, and constructing input characteristics according to the steps (1.1) - (1.4)

/>

(5.2) according to the input characteristics

Calculating probability distribution of the target user real-time load predicted value:

wherein ,f_* Is a probability distribution function;

(5.3) averaging the probability distribution

At T as target user ₁ Load predicted value at +1.

The invention aims at realizing the following steps:

according to the short-term load prediction method under the small sample set based on transfer learning, historical load data and corresponding temperatures of a plurality of source domain users are collected, so that a plurality of input features are constructed; and training a plurality of depth residual error network models by using input features, carrying out adaptive migration combination on the depth residual error network models by using a Bayesian weighted probability averaging method, and carrying out real-time load prediction of a target user after the migration combination is completed.

Meanwhile, the short-term load prediction method based on the migration learning under the small sample set has the following beneficial effects:

(1) The depth residual error network adopted by the invention has a jump connection structure, so that the problems of information loss and gradient disappearance in the training process can be reduced, the information loss in the migration process can be reduced, and the robustness of the migration can be improved;

(2) Performing adaptive migration combination under the condition of limited load data of a target user, wherein the model combination process can eliminate the influence of negative migration information (namely, eliminate the influence of a source domain user with larger difference with the load characteristics of the target user), thereby constructing an optimal small sample load prediction model suitable for the target user;

(3) The invention adopts Bayesian weighted probability method to carry out self-adaptive combination on migration model, the process adopts maximum likelihood estimation, and utilizes maximized posterior probability to solve the weight of embedded model, the utilization rate of the method to the sample is 100%, and probability density function prediction is provided, thereby quantifying the uncertainty of prediction.

Drawings

FIG. 1 is a flow chart of a short-term load prediction method under a small sample set based on transfer learning;

FIG. 2 is a flow chart of data acquisition and preprocessing;

FIG. 3 is a schematic diagram of the structure of a depth residual network;

FIG. 4 is a deterministic load prediction curve of the method of the present invention versus several other methods;

fig. 5 is a graph showing probability density predictions for different prediction intervals for the method of the present invention.

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

Examples

FIG. 1 is a flow chart of a short-term load prediction method under a small sample set based on transfer learning.

In this embodiment, the selected source domain user has a certain correlation with the target user load sequence, which belongs to a cell and a city, and this ensures the validity of migration knowledge. As shown in fig. 1, the short-term load prediction method based on the migration learning under the small sample set of the invention comprises the following steps:

s1, data acquisition and preprocessing, wherein the specific flow is shown in FIG. 2;

s1.1, setting a load sampling period T, wherein in the embodiment, the load sampling period is set to be 1 hour, namely 24 points/day section;

s1.2, historical load data x of M=19 source domain users according to load sampling period T _load Corresponding temperature x _temp Collecting to construct a data set and a temperature set, and recording an ith source domain user constructed data set as

And the temperature set is

wherein ,/>

s1.3, respectively eliminating

and />

Then, carrying out linear interpolation on the vacancy values to obtain a data sample +.>

S1.4, data sample

Add time characterization variable +.>

Includes a time sequence variable (i.e., hours of each day), a day variable (i.e., days of each week), a holiday variable (working day is 0, weekend is 1), and->

S2, building a depth residual error network model;

as shown in fig. 3, the depth residual network is formed by L residual block hops. The specific connection process is as follows: the output of the first residual block is added to the output of the first+2 residual block as the input of the first+3 residual block, and the output of the first+1 residual block is added to the output of the first+3 residual block as the input … … of the first+4 residual block, where l=1, 2. The depth residual error network adopts a structural mode with jump connection, so that the problems of information loss and gradient disappearance in the training process can be reduced.

Each residual block is composed of a convolution layer, a normalization layer and a Relu activation function layer; the convolution kernel size of the convolution layer is set to 3×3.

In this embodiment, each residual block is expressed by an equation:

wherein ,W_l Parameters representing the first residual block; x is x _l An input representing a first residual block; x is x _l+1 Representing the output of the l-th residual block and also serving as the input of the l+1-th residual block; l=1, 2, …, L representing the residual block number;

s3, training a depth residual error network model based on source domain user data;

s3.1 from the ith input feature

Is selected randomly->

Data at each moment as one-round training data +.>

s3.2, in the depth residual error network model, setting the input tensor of the first residual error block as Z ^(l-1) In the left branch of the first residual block, tensor Z ^(l-1) Feature extraction is carried out through a convolution kernel formed by a plurality of expansion causal convolutions, and then a convolution layer, a normalization layer, a Relu activation function layer, a convolution layer and a normalization layer are sequentially carried out to obtain an output tensor of a left branch

S3.3, repeating the step S3.2 until the last residual block outputs Z ^(L) Finally Z is ^(L) Outputs of two full connection layers connected in parallel, the outputs of which are recorded as

And is used as a predicted value at the time t;

s3.4, after the training of the training data of the present round is completed, calculating a loss function value MAPE of the training of the present round:

wherein ,

an observation value at time t;

s3.5, setting a loss threshold delta; calculating the difference delta MAPE between the loss function value after the current round of training and the loss function value after the previous round of training, comparing the delta MAPE with delta, and if delta MAPE is less than or equal to delta, finishing training to obtain an ith depth residual error network model; otherwise, updating the weight in the depth residual error network by using a batch gradient descent algorithm, and returning to the step S3.1 for the next round of training;

s3.6, training the depth residual error network by M input features according to the steps S3.1-S3.5, and finally obtaining M depth residual error network models which are marked as { F } ₁ ,F ₂ ,…,F _i ,…,F _M }；

S4, performing adaptive combination on the depth residual error network model by using a Bayes weighted probability averaging method;

s4.1, setting the acquisition period T of the target user ₁ ；

S4.2 according to the sampling period T ₁ The target user load value only containing limited historical load data (the reason for the lack of the historical load data of the user may be that the user is a new user, a newly moved user or the smart meter of the user is damaged, etc.), such as load data of 1-2 days

And temperature data->

Obtaining a load dataset

And a temperature dataset

wherein />

and />

Respectively representing the historical load data and the corresponding temperature acquired by the target user at the t sampling time, and then constructing input features according to steps S1.3-S1.4, and marking as +.>

S4.3, constructing input feature pairs

wherein ,/>

Input features representing time t

Load observation values at time t are represented;

s4.4 to input features

wherein ,/>

s4.5, calculating

Is>

/>

Where N represents obeying a gaussian distribution, ω= { ω ₁ ，ω ₂ ,...,ω _M The indication is given to

Different weights, ++>

Representing the variance;

s4.6, calculating

The probability of (2) is as follows:

wherein I represents an identity matrix;

and S4.7, converting the model combination problem into posterior distribution of the solving weight omega. Bayesian disciplines require a priori assignment of ownership weights, which represents the confidence in the weights before they are observed. Assuming that the a priori obeying mean of the weight ω is 0 and the variance is Σ _p Is a gaussian distribution of (c):

ω＝N(0,Σ _p )

wherein ,

mean value of omega posterior probability distribution

Sum of variances A ^-1 All 1 XM matrix, multiplying ω by Y, i.e. +.>

Thereby realizing the migration combination of M depth residual error network models. In this embodiment, migration combining is performed under the scenario of limited load data of the target user, and the model combining process can eliminate the influence of negative migration information, so as to construct an optimal small sample load prediction model suitable for the target user.

S5, predicting the real-time load;

s5.1, collecting historical load data and temperature data of a target user in real time, and constructing input characteristics according to steps S1.1-S1.4

S5.2, according to the input characteristics

Calculating probability distribution of the target user real-time load predicted value: />

wherein ,f_* Is a probability distribution function;

s5.3, the average value in the probability distribution

At T as target user ₁ Load predictive value at +1 moment, variance +.>

For evaluating target user at T ₁ Uncertainty of load prediction at time +1.

And (3) verification:

in order to accurately and effectively evaluate deterministic prediction and uncertainty prediction, the invention evaluates deterministic prediction effects using MAPE, and CRPS evaluates uncertainty prediction effects, with the CRPS expression:

where CDF represents a cumulative distribution function. CRPS can comprehensively evaluate reliability of probability density estimation and interval sharpness.

Table 1 compares the deterministic predicted performance of the method of the present invention with several other methods under different users, i.e. the values of different model MAPEs. The linear regression model LR, the Gaussian process GP, the long-short-term memory neural network LSTM and the Bayesian linear regression BLR are trained only under limited historical data of a target user, the multi-element nuclear migration model M-BTMKR, the traditional weighted average method WA and the single migration model BI with the best performance are taken as migration prediction methods, and the model is obviously superior to other models under each situation. The average prediction error of the method is only 3.2%, which means that the method can be further applied to actual engineering and production. FIG. 4 is a graph of deterministic load prediction for the method of the present invention and several other methods. As can be seen from fig. 4, the predicted value of the method of the present invention is significantly closer to the actual value than other methods, which means that the method of the present invention has higher prediction accuracy than other methods.

Table 2 compares the uncertainty prediction performance of the method of the present invention with that of several other methods, namely, CRPS comparison of probability density function predictions, where L-GP represents the probability density function prediction of GP only under limited history, L-BLR represents the probability density function prediction of BLR only under limited history, S-GP represents the probability density function prediction of GP under sufficient history, and S-BLR represents the probability density function prediction of BLR under sufficient history. It can be seen that the comparison method is less effective than the method of the present invention, both in a limited historical data scenario and in a sufficient historical data scenario. Even though the S-GP model and the S-BLR model are trained on sufficient samples, their CRPS is inferior to the method of the present invention per user. Fig. 5 shows the prediction interval of the inventive method at 90% confidence, and it can be seen that substantially 90% of the samples fall within the prediction interval of the inventive method, which illustrates that the inventive method has higher reliability. Based on the prediction interval obtained by uncertainty estimation of the method, certain risk decision can be made in actual production, and the uncertainty of prediction can be quantified.

Table 1 shows the MAPE [% ] comparison for the different models;

table 2 is CRPS comparison for different models;

while the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. The short-term load prediction method based on the migration learning under the small sample set is characterized by comprising the following steps of:

(1) Data acquisition and pretreatment;

(1.1) setting a load sampling period T;

And the temperature set is

wherein ,/>

(1.3) removing

and />

Abnormal value in the data sample is obtained by linear interpolation

(1.4) as data samples

Add time characterization variable +.>

Comprises time sequence variable, day-level variable, holiday variable,>

(2) Building a depth residual error network model;

(3) Training a depth residual network model based on source domain user data;

(3.1) from the ith input feature

Is selected randomly->

The data at each moment are used as training data for one round, < >>

(3.2) in the depth residual network model, the input tensor of the first residual block is set as Z ^(l-1) In the left branch of the first residual block, tensor Z ^(l-1) Feature extraction is carried out through a convolution kernel formed by a plurality of expansion causal convolutions, and then a convolution layer, a normalization layer, a Relu activation function layer, a convolution layer and a normalization layer are sequentially carried out to obtain an output tensor of a left branch

Matching the dimension of the output tensor of the left branch, and then adding the output tensors of the two branches to obtain the output +.>

And is used as a predicted value at the time t;

wherein ,

an observation value at time t;

(4.1) setting the acquisition period T of the target user ₁ ；

And a temperature dataset

wherein />

and />

(4.3) constructing input feature pairs

wherein ,/>

Input features representing time t

Load observation values at time t are represented;

(4.4), input features

wherein ,/>

(4.5), calculation

Is>

Wherein N represents a Gaussian distribution,

Different weights;

(4.6), calculation

The probability of (2) is as follows:

wherein I represents an identity matrix;

ω＝N(0,Σ _p )

wherein ,

(5) Predicting the real-time load;

(5.2) according to the input characteristics

wherein ,f_* Is a probability distribution function;

(5.3) averaging the probability distribution

At T as target user ₁ Load predicted value at +1.

2. The short-term load prediction method under a small sample set based on transfer learning according to claim 1, wherein the residual block is expressed as:

wherein ,W_l Parameters representing the first residual block; x is x _l An input representing a first residual block; x is x _l+1 Representing the output of the l-th residual block and also serving as the input of the l+1-th residual block; l=1, 2, …, L representing the residual block number.