CN116316573A

CN116316573A - Short-term power load prediction method based on nonstandard Bayesian algorithm optimization

Info

Publication number: CN116316573A
Application number: CN202310184153.5A
Authority: CN
Inventors: 王文慧; 李垣江
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2023-03-01
Filing date: 2023-03-01
Publication date: 2023-06-23

Abstract

The invention discloses a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which comprises the following steps: constructing a multivariable feature set; preprocessing the multivariable feature set; taking the multivariable feature set as an input of the time convolution network; taking the output of the time convolution network as the input of the gating circulating unit; taking the output of the gating circulation unit as the input of a memory mechanism to obtain an output power load prediction result; the super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure. The embodiment of the invention provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, which not only overcomes the defect of a single model, but also optimizes the super parameters in the model by using a TPE optimization algorithm, thereby further improving the effectiveness and adaptability of the model.

Description

Short-term power load prediction method based on nonstandard Bayesian algorithm optimization

Technical Field

The invention relates to the technical field of power systems, in particular to a short-term power load prediction method based on nonstandard Bayesian algorithm optimization.

Background

Currently, the power industry in China rapidly develops, the power grid supply is continuously enlarged, the power system structure and the operation mode of the power system are increasingly diversified, and the safety, the reliability and the stable operation of the system of the power grid face greater challenges under the new background. Short-term power load prediction is an important work of a power management department, and accurate load prediction can reasonably arrange a production plan and ensure the real-time performance and economy of power scheduling, so that the power generation cost is reduced.

Improving the accuracy of short-term power load prediction not only can improve the overall economic benefit of the power system, but also can enhance the stability and safety of the system operation. With the continuous development of a power system, the difficulty of short-term power load prediction is greatly increased, and the provision of a reasonable and effective prediction model becomes an important point of current research. Although convolutional neural networks (Convolutional Neural Network, CNN) are applicable to load data prediction of multidimensional variables, they are not fully applicable to learning time series and require various auxiliary processes. The recurrent neural network (Recurrent Neural Network, RNN) has a good effect in processing time series data, but it has a serious gradient problem in the case of long time series input. Compared with RNN, LSTM network is more effective in processing long time sequence prediction problem, but LSTM three gating unit parameters are more, and the network has the problems of longer training time and lower training efficiency.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which aims to solve the problems of longer training time and lower training efficiency of a prediction model adopted for short-term power load prediction in the prior art.

The embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which comprises the following steps:

constructing a multivariable feature set;

preprocessing the multivariable feature set;

taking the multivariable feature set as an input of the time convolution network;

taking the output of the time convolution network as the input of the gating circulating unit;

taking the output of the gating circulation unit as the input of a memory mechanism to obtain an output power load prediction result;

the super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure.

Optionally, the multivariate feature set comprises: at least one of historical load data, rainfall, wind speed, maximum temperature, minimum temperature, and humidity.

Optionally, preprocessing the multivariate feature set comprises:

performing normalization operation and deletion value operation on the multivariable feature set to obtain a preprocessing feature set;

and on the basis of the primary screening result of the pearson correlation coefficient, reversely selecting the preprocessing feature set by utilizing recursive feature elimination, discarding the least important features in each iteration until all the features are traversed, and finishing feature screening.

Optionally, the time convolution network comprises a set of residual units; each residual unit comprises 2 convolution units and a nonlinear mapping; the nonlinear mapping is used for performing dimension reduction output on high-dimension input data.

Optionally, the time convolution network further comprises:

adjusting the sampling interval by the expansion coefficient;

only convolving the input before the time t to obtain the output at the time t;

normalizing the weights by constructing a new activation function;

and smoothing the activation function by adopting a MaxPropout algorithm.

Optionally, the non-standard bayesian optimization algorithm based on the tree structure comprises:

an importance sampling algorithm is selected as a sampling function;

constructing an importance weight function by giving an observation value and setting an importance weight;

extracting sample points based on a law of large numbers;

and acquiring an importance weight expected value.

Optionally, the method further comprises:

in the current observation, the hyper-parameters corresponding to the observed values when the ratio between the two output results of the sampling function is maximized are used as the search values of the next group.

Optionally, the flow of the non-standard bayesian optimization algorithm based on the tree structure comprises the following steps:

defining a super-parameter search space domain;

selecting a hyper-parameter input from a hyper-parameter search space domain;

setting a prediction error as an objective function;

randomly selecting a group of initialization super parameters and obtaining corresponding observation values;

carrying out probability density estimation of a nonstandard Bayesian optimization algorithm;

extracting a sample super-parameter from the sampling function output result and evaluating to obtain a set of minimum values corresponding to the maximum expected improvement;

determining a super-parameter combination which enables the entropy index to be maximum as an optimal super-parameter combination;

inputting the optimal super-parameter combination into a load prediction model for training;

evaluating the prediction result of the load prediction model;

if the prediction error of the prediction result accords with the set prediction precision condition, the training is terminated;

otherwise, the sampling function is corrected, and the super-parameter combination is searched again until the prediction accuracy condition is met.

The embodiment of the invention has the beneficial effects that:

1. the embodiment of the invention provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, which not only overcomes the defect of a single model, but also optimizes the super parameters in the model by using a TPE optimization algorithm, thereby further improving the effectiveness and adaptability of the model.

2. The prediction model provided by the embodiment of the invention fully combines the TCN extraction time sequence feature capability, GRU nonlinear fitting capability and attribute screening capability, realizes optimal combination, can effectively improve the accuracy of short-term power load prediction, constructs a new activation function, is more gentle, and plays an important role in optimization and generalization. On the basis, the TPE is utilized to optimize the super parameters in the model, so that the model has higher prediction precision and better fitness.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:

FIG. 1 illustrates a flow chart of a short-term power load prediction method based on nonstandard Bayesian algorithm optimization in an embodiment of the present invention;

FIG. 2 shows a flow chart of a super-parameter optimization algorithm in an embodiment of the invention;

FIG. 3 illustrates a short-term power load prediction model structure based on nonstandard Bayesian algorithm optimization in accordance with an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which is shown in fig. 1 and comprises the following steps:

and step S10, constructing a multi-variable feature set.

In this embodiment, since the power load is affected by various factors, historical load data of a region required for short-term power load prediction and meteorological factors such as rainfall, wind speed, maximum temperature, minimum temperature and humidity are collected, and a multivariate feature set is constructed.

And step S20, preprocessing the multivariable feature set.

In this embodiment, since the magnitude difference between the historical power load data and the feature subset is large and there are problems such as missing values, the model cannot effectively identify and extract information from the data, so that the quality of the model is affected, and therefore, the data is preprocessed such as normalization and missing values deletion.

And by combining with feature engineering, a feature set with high relevance to load data is screened and constructed, so that the difficulty of learning tasks can be reduced, and the generalization capability of the model can be enhanced.

In a specific embodiment, the data is preprocessed by maximum and minimum normalization, the calculation process is as shown in formula (1), X represents a certain variable of the original data, and X represents new normalized data.

The effectiveness and generalization capability of the prediction model fundamentally depend on the excellent of the load data and the related features, so that the generalization capability of the model can be enhanced by selecting the features with high relevance to the load data, and the efficiency of the model is improved; the feature with weak relevance to the load data is screened out, so that the feature quantity can be reduced, and the difficulty of learning tasks is reduced. Pearson correlation coefficients are widely used to measure the correlation between two variables, and assuming that another variable Y exists, the normalized new data Y is calculated by equation 1, and the pearson correlation coefficient is calculated as shown in equation (2).

The pearson correlation coefficient is between 0.0 and 0.2 to represent extremely weak correlation or no correlation, and firstly, the pearson correlation coefficient is used for eliminating part of characteristic factors with extremely weak correlation or irrelevant from a plurality of characteristics, so that the characteristic primary screening is completed. On the primary screening result of pearson correlation coefficient, the feature is reversely selected by using Recursive Feature Elimination (RFE). RFE can eliminate redundancy among features, reduce feature dimension, and select optimal feature combination, and its main idea is to search feature subset from all features in training dataset in combination with regression model and iterate circularly, discard least important features in each iteration, and iterate on the rest features until all features are traversed, thus completing feature screening.

TABLE 1 original feature set

In a specific embodiment, the initial set of multivariate features is shown in table 1. And by combining with feature engineering, a feature set with high relevance to load data is screened and constructed, so that the difficulty of learning tasks can be reduced, and the generalization capability of the model can be enhanced.

Step S30, taking the multivariable feature set as an input of the time convolution network.

The time convolution network (Temporal Convolutional Network, TCN) is an improvement of the CNN network, can effectively process sequence modeling tasks, keeps more expansion memory, and has overall performance superior to that of the network such as RNN, LSTM and the like. Compared with one-dimensional convolution, TCN realizes extraction of sequence features by adding causal convolution and hole convolution. Meanwhile, residual connection is used among all network layers, so that the gradient problem is avoided. The TCN model is a simple and universal convolutional neural network architecture for solving the time series problem. The TCN model consists of a group of residual units, each residual unit is a small neural network with residual connection, and feedback and convergence of a deep network can be quickened through the residual connection, so that the degradation phenomenon caused by the increase of network layers is solved.

In this embodiment, the learned feature when the input is x is denoted as h (x) for one stacked layer structure (several layers are stacked), so that it can learn the residual unit as shown in the following formula 3:

f _TCN (x)＝h(x)-x (3)

the original learning feature is h (x). Residual learning is easier to implement than original feature direct learning.

The residual unit contains 2 convolution units and a nonlinear mapping. The nonlinear mapping is to reduce the dimension of data of high dimension when the input and output of the residual unit have different dimensions. First, one-dimensional expansion causal convolution is carried out in a convolution unit, wherein the operation formula of the one-dimensional expansion causal convolution is shown as a formula (4).

Wherein f is a filter; d is the expansion coefficient; k is the convolution kernel size; s is input time sequence information; s-d.i represents the past direction, s-d.i ensuring that only the past input can be convolved.

In a specific embodiment, a sampling interval is adjusted through an expansion coefficient, so that a larger Receptive Field (RF), namely a region range which can be seen by features on a convolution layer, is realized, a network can memorize history information which is long enough, and only input before t time is convolved to obtain output at t time, so that future information cannot be revealed; and then carrying out normalization processing on the weights to construct a new activation function, wherein the calculation process is shown in a formula (5).

The constructed activation function is more gentle, and smoothness plays an important role in optimization and generalization. Finally, maxDropout operation is adopted, the method is to a certain extent that the mixture of dropout and gauss dropout is executed on the maximum pooling layer, but a Bayesian method is used for achieving the purpose of preventing overfitting and accelerating model training speed, and the calculation process is shown in a formula (6).

Step S40, taking the output of the time convolution network as the input of the gating circulating unit.

In the embodiment, the gating cycle unit (GateRecurrentUnit, GRU) is used as a variant of LSTM, so that the problems of long LSTM training time and low training efficiency are well solved while the prediction accuracy is high. Wherein the door z is updated _t Determining how much of the previous time gating structure remains to the current cell information, and resetting the gate r _t Mainly determining how much past information needs to be forgotten, inputting the output result of the TCN into the GRU network to obtain the memory output h, and the calculation process is shown in formulas (7) - (10).

In the formulae (7) to (10), ω _x,z ，ω _h,z ，ω _x,r ，ω _x,g ，ω _h,g Representing a parameter training matrix;

and->

Respectively representing the outputs of GRU networks at the current moment and the previous moment; sigma and tanh represent Sigmoid and tanh activation functions, g, respectively _t Indicates the output gate +.>

The value +.about.at the current time is calculated by equation (10)>

And S50, taking the output of the gating circulating unit as the input of a memory mechanism to obtain an output power load prediction result.

In the deep learning field, a model generally needs to receive and process a large amount of data, however, at a specific moment, only a small part of data is often important, and Attention mechanisms (Attention) derive a set of weight coefficients through the autonomous learning capability of a network, concentrate Attention on important information, ignore or suppress unimportant information, so as to strengthen important features more effective for a prediction model. The attention-drawing mechanism can effectively pay attention to important information with great influence on the model, and the prediction accuracy is further improved.

In this embodiment, the calculation process of the network input of the GRU to the Attention mechanism is as shown in formulas (11) - (14).

In the formulas (11) - (14), S _ti The attention scores calculated for the additive model, V, W, U, are all parameters that the additive model can learn in training; exp is an exponential function; according to the weight coefficient pair

And carrying out weighted summation to obtain F. In the embodiment, a TCN-GRU-Attention network model is built, the defect of a single network is overcome, the prediction error can be reduced, and better prediction can be generated than any single model.

The selection of the super parameters is crucial to the prediction effect and generalization capability of the model, the training of the neural network is influenced by the model structure, the data characterization and the optimization method, and the like, each link involves a plurality of parameters and the super parameters, the final effect of the model is determined by adjusting the parameters, and the common method is manual adjustment, but the method is time-consuming and labor-consuming, and the optimal solution is difficult to obtain, so that the prediction accuracy of the model is reduced. The Bayesian optimization algorithm (BayesianOptimization, BO) is one of the most commonly used algorithms for super-parametric optimization, and the traditional Bayesian optimization algorithm is based on a Gaussian process, has wide application, is excellent in a low-dimensional space, and is weak in a high-dimensional space.

In the embodiment, the super parameter optimization algorithm (Tree-structured Parzen Estimator, TPE) is utilized to optimize the super parameter in the model, so that the generalization capability of the model is enhanced, and the prediction precision of the model is improved.

An importance sampling algorithm is selected as a sampling function, and the calculation process is shown as a formula (15).

Given an observation value of x, p (x) is an x probability density function, q (x) is a probability density function with the same definition domain as p (x), let

w (x) is an importance weight, and the structural formula (16) is as follows.

E _p(x) f(x)＝E _q(x) w(x)f(x) (16)

And extracting sample points from q (x) by adopting a law of large numbers, and estimating expectations. Bringing formula (16) into definition can give formula (17).

Samples can be decimated for the numerator and denominator using the proposed distribution q (x), with importance weights for the expectations. Q (x) is close to p (x) in the refusal of sampling, i.e

Is close to->

In the new proposed distribution, the extracted samples are only a multiple of the original proposed distribution, i.e. the extracted sample values of the normalized target distribution are +.>

Multiple times.

In the current observation, x with the greatest ratio of l (x)/g (x) is the required observation value. In each iteration, the corresponding hyper-parameter at which l (x)/g (x) is maximized is selected as the next set of search values, and the algorithm returns the point with the largest EI value. Compared with other Bayesian optimization algorithms, the TPE optimization algorithm based on the Gaussian mixture model has better effect under the condition of high-dimensional space, the optimization speed is obviously improved, and better results are obtained with higher efficiency. As shown in fig. 2, the TPE optimization flow is described as follows:

step 1: a domain of a hyper-parametric search space is defined from which hyper-parameters of an algorithm are selected.

Step 2: the prediction error is set as an objective function that receives the super-parameters and outputs the minimum prediction error required.

Step 3: a set of initialization hyper-parameters is randomly selected and several observations are obtained.

Step 4: TPE probability density estimation is performed, sample superparameters are extracted from l (x), an evaluation is made based on l (x)/g (x), and a set of minima is returned that yield at l (x)/g (x) that correspond to the maximum expected improvement.

Step 5: and determining the super-parameter combination with the EI maximum value as an optimal super-parameter combination, inputting the optimal super-parameter combination into a load prediction model for training, outputting a prediction result of the current model, and evaluating.

Step 6: and evaluating the prediction error, if the prediction accuracy is met, terminating the algorithm, otherwise, correcting the sampling function, and searching for the super-parameter combination again until the accuracy requirement is met.

In this embodiment, the TPE algorithm is used to optimize the super parameters in the model, and the optimizing ranges of the parameters are shown in table 2, where Filters, kernel _size and Dropoutrate represent the filter, the convolution kernel size and the discard rate, units, learning and batch_size represent the number of hidden neurons, the learning rate and the Batch size, respectively. The selection of the super parameters of the model is often determined by personal experience or repeated parameter adjustment, the selected super parameters may not be optimal, and the embodiment utilizes the TPE optimization algorithm to optimize the super parameters in the model, so that the prediction precision can be further improved.

Table 2 super parameter optimizing range

As shown in fig. 3, this embodiment proposes a power load prediction model based on TPE optimization TCN-GRU-Attention. The TCN is an improvement of the CNN network, can effectively process sequence modeling tasks, keeps more expansion memory, and has overall performance superior to that of networks such as RNN, LSTM and the like. Compared with one-dimensional convolution, TCN realizes extraction of sequence features by adding causal convolution and hole convolution. Meanwhile, residual connection is used among all network layers, so that the gradient problem is avoided. The TCN sequence modeling has several advantages:

(1) adaptive sequence model (causal convolution): the structure of causal convolution is unidirectional, which does not take future information into account, and the farther the historical information is traced back, the more hidden layers. The problem of the traditional CNN network still exists in the simple causal convolution, namely the modeling length of time is limited by the size of a convolution kernel;

(2) parallelism: the traditional RNN's prediction of subsequent time steps must be done after their previous time steps, i.e. must be processed sequentially, whereas long input sequences can be processed in parallel as a whole in the TCN;

(3) stable gradient: the reverse propagation path of TCN is different from the direction of the time series. Thus, TCN avoids the problem of gradient explosions or vanishing of RNN;

the accurate power load prediction is a necessary premise for constructing a novel and intelligent power system, so as to fully mine time sequence characteristics in power load data and further improve prediction accuracy, the embodiment provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, and the current TCN, GRU and Attention mechanisms which are widely applied are combined, so that the defects of a single model are overcome, and the advantages are complementary. In addition, the TPE hyper-parameter optimization algorithm is utilized to optimize the hyper-parameters in the model, so that the fitting property and the effectiveness of the model can be further improved, and the TPE hyper-parameter optimization algorithm has a certain practical significance.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims

1. A short-term power load prediction method based on nonstandard bayesian algorithm optimization, comprising:

constructing a multivariable feature set;

preprocessing the multivariable feature set;

taking the multivariate feature set as an input to a time convolution network;

taking the output of the time convolution network as the input of a gating circulation unit;

2. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein the multivariate feature set comprises: at least one of historical load data, rainfall, wind speed, maximum temperature, minimum temperature, and humidity.

3. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein preprocessing the multivariate feature set comprises:

performing normalization operation and deletion value operation on the multivariate feature set to obtain a preprocessing feature set;

4. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein the time convolution network comprises a set of residual units; each residual unit comprises 2 convolution units and a nonlinear mapping; the nonlinear mapping is used for performing dimension reduction output on high-dimension input data.

5. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 4, wherein the time convolution network further comprises:

adjusting the sampling interval by the expansion coefficient;

only convolving the input before the time t to obtain the output at the time t;

normalizing the weights by constructing a new activation function;

and smoothing the activation function by adopting a Max Dropout algorithm.

6. The non-standard bayesian-algorithm-based optimized short-term power load prediction method according to claim 1, wherein the tree-structure-based non-standard bayesian optimization algorithm comprises:

an importance sampling algorithm is selected as a sampling function;

extracting sample points based on a law of large numbers;

and acquiring an importance weight expected value.

7. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 6, further comprising:

in the current observation, taking the hyper-parameters corresponding to the observed values when the ratio between the two output results of the sampling function is maximized as the search value of the next group.

8. The method for short-term power load prediction based on nonstandard bayesian algorithm optimization according to claim 7, wherein the flow of the nonstandard bayesian optimization algorithm based on a tree structure comprises:

defining a super-parameter search space domain;

selecting a hyper-parametric input from the hyper-parametric search space domain;

setting a prediction error as an objective function;

evaluating the prediction result of the load prediction model;

otherwise, correcting the sampling function, and searching for the super-parameter combination again until the prediction accuracy condition is met.