CN116316573A - Short-term power load prediction method based on nonstandard Bayesian algorithm optimization - Google Patents

Short-term power load prediction method based on nonstandard Bayesian algorithm optimization Download PDF

Info

Publication number
CN116316573A
CN116316573A CN202310184153.5A CN202310184153A CN116316573A CN 116316573 A CN116316573 A CN 116316573A CN 202310184153 A CN202310184153 A CN 202310184153A CN 116316573 A CN116316573 A CN 116316573A
Authority
CN
China
Prior art keywords
super
power load
load prediction
algorithm
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310184153.5A
Other languages
Chinese (zh)
Inventor
王文慧
李垣江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202310184153.5A priority Critical patent/CN116316573A/en
Publication of CN116316573A publication Critical patent/CN116316573A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Power Engineering (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which comprises the following steps: constructing a multivariable feature set; preprocessing the multivariable feature set; taking the multivariable feature set as an input of the time convolution network; taking the output of the time convolution network as the input of the gating circulating unit; taking the output of the gating circulation unit as the input of a memory mechanism to obtain an output power load prediction result; the super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure. The embodiment of the invention provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, which not only overcomes the defect of a single model, but also optimizes the super parameters in the model by using a TPE optimization algorithm, thereby further improving the effectiveness and adaptability of the model.

Description

Short-term power load prediction method based on nonstandard Bayesian algorithm optimization
Technical Field
The invention relates to the technical field of power systems, in particular to a short-term power load prediction method based on nonstandard Bayesian algorithm optimization.
Background
Currently, the power industry in China rapidly develops, the power grid supply is continuously enlarged, the power system structure and the operation mode of the power system are increasingly diversified, and the safety, the reliability and the stable operation of the system of the power grid face greater challenges under the new background. Short-term power load prediction is an important work of a power management department, and accurate load prediction can reasonably arrange a production plan and ensure the real-time performance and economy of power scheduling, so that the power generation cost is reduced.
Improving the accuracy of short-term power load prediction not only can improve the overall economic benefit of the power system, but also can enhance the stability and safety of the system operation. With the continuous development of a power system, the difficulty of short-term power load prediction is greatly increased, and the provision of a reasonable and effective prediction model becomes an important point of current research. Although convolutional neural networks (Convolutional Neural Network, CNN) are applicable to load data prediction of multidimensional variables, they are not fully applicable to learning time series and require various auxiliary processes. The recurrent neural network (Recurrent Neural Network, RNN) has a good effect in processing time series data, but it has a serious gradient problem in the case of long time series input. Compared with RNN, LSTM network is more effective in processing long time sequence prediction problem, but LSTM three gating unit parameters are more, and the network has the problems of longer training time and lower training efficiency.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which aims to solve the problems of longer training time and lower training efficiency of a prediction model adopted for short-term power load prediction in the prior art.
The embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which comprises the following steps:
constructing a multivariable feature set;
preprocessing the multivariable feature set;
taking the multivariable feature set as an input of the time convolution network;
taking the output of the time convolution network as the input of the gating circulating unit;
taking the output of the gating circulation unit as the input of a memory mechanism to obtain an output power load prediction result;
the super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure.
Optionally, the multivariate feature set comprises: at least one of historical load data, rainfall, wind speed, maximum temperature, minimum temperature, and humidity.
Optionally, preprocessing the multivariate feature set comprises:
performing normalization operation and deletion value operation on the multivariable feature set to obtain a preprocessing feature set;
and on the basis of the primary screening result of the pearson correlation coefficient, reversely selecting the preprocessing feature set by utilizing recursive feature elimination, discarding the least important features in each iteration until all the features are traversed, and finishing feature screening.
Optionally, the time convolution network comprises a set of residual units; each residual unit comprises 2 convolution units and a nonlinear mapping; the nonlinear mapping is used for performing dimension reduction output on high-dimension input data.
Optionally, the time convolution network further comprises:
adjusting the sampling interval by the expansion coefficient;
only convolving the input before the time t to obtain the output at the time t;
normalizing the weights by constructing a new activation function;
and smoothing the activation function by adopting a MaxPropout algorithm.
Optionally, the non-standard bayesian optimization algorithm based on the tree structure comprises:
an importance sampling algorithm is selected as a sampling function;
constructing an importance weight function by giving an observation value and setting an importance weight;
extracting sample points based on a law of large numbers;
and acquiring an importance weight expected value.
Optionally, the method further comprises:
in the current observation, the hyper-parameters corresponding to the observed values when the ratio between the two output results of the sampling function is maximized are used as the search values of the next group.
Optionally, the flow of the non-standard bayesian optimization algorithm based on the tree structure comprises the following steps:
defining a super-parameter search space domain;
selecting a hyper-parameter input from a hyper-parameter search space domain;
setting a prediction error as an objective function;
randomly selecting a group of initialization super parameters and obtaining corresponding observation values;
carrying out probability density estimation of a nonstandard Bayesian optimization algorithm;
extracting a sample super-parameter from the sampling function output result and evaluating to obtain a set of minimum values corresponding to the maximum expected improvement;
determining a super-parameter combination which enables the entropy index to be maximum as an optimal super-parameter combination;
inputting the optimal super-parameter combination into a load prediction model for training;
evaluating the prediction result of the load prediction model;
if the prediction error of the prediction result accords with the set prediction precision condition, the training is terminated;
otherwise, the sampling function is corrected, and the super-parameter combination is searched again until the prediction accuracy condition is met.
The embodiment of the invention has the beneficial effects that:
1. the embodiment of the invention provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, which not only overcomes the defect of a single model, but also optimizes the super parameters in the model by using a TPE optimization algorithm, thereby further improving the effectiveness and adaptability of the model.
2. The prediction model provided by the embodiment of the invention fully combines the TCN extraction time sequence feature capability, GRU nonlinear fitting capability and attribute screening capability, realizes optimal combination, can effectively improve the accuracy of short-term power load prediction, constructs a new activation function, is more gentle, and plays an important role in optimization and generalization. On the basis, the TPE is utilized to optimize the super parameters in the model, so that the model has higher prediction precision and better fitness.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:
FIG. 1 illustrates a flow chart of a short-term power load prediction method based on nonstandard Bayesian algorithm optimization in an embodiment of the present invention;
FIG. 2 shows a flow chart of a super-parameter optimization algorithm in an embodiment of the invention;
FIG. 3 illustrates a short-term power load prediction model structure based on nonstandard Bayesian algorithm optimization in accordance with an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which is shown in fig. 1 and comprises the following steps:
and step S10, constructing a multi-variable feature set.
In this embodiment, since the power load is affected by various factors, historical load data of a region required for short-term power load prediction and meteorological factors such as rainfall, wind speed, maximum temperature, minimum temperature and humidity are collected, and a multivariate feature set is constructed.
And step S20, preprocessing the multivariable feature set.
In this embodiment, since the magnitude difference between the historical power load data and the feature subset is large and there are problems such as missing values, the model cannot effectively identify and extract information from the data, so that the quality of the model is affected, and therefore, the data is preprocessed such as normalization and missing values deletion.
And by combining with feature engineering, a feature set with high relevance to load data is screened and constructed, so that the difficulty of learning tasks can be reduced, and the generalization capability of the model can be enhanced.
In a specific embodiment, the data is preprocessed by maximum and minimum normalization, the calculation process is as shown in formula (1), X represents a certain variable of the original data, and X represents new normalized data.
Figure BDA0004103141140000051
The effectiveness and generalization capability of the prediction model fundamentally depend on the excellent of the load data and the related features, so that the generalization capability of the model can be enhanced by selecting the features with high relevance to the load data, and the efficiency of the model is improved; the feature with weak relevance to the load data is screened out, so that the feature quantity can be reduced, and the difficulty of learning tasks is reduced. Pearson correlation coefficients are widely used to measure the correlation between two variables, and assuming that another variable Y exists, the normalized new data Y is calculated by equation 1, and the pearson correlation coefficient is calculated as shown in equation (2).
Figure BDA0004103141140000052
The pearson correlation coefficient is between 0.0 and 0.2 to represent extremely weak correlation or no correlation, and firstly, the pearson correlation coefficient is used for eliminating part of characteristic factors with extremely weak correlation or irrelevant from a plurality of characteristics, so that the characteristic primary screening is completed. On the primary screening result of pearson correlation coefficient, the feature is reversely selected by using Recursive Feature Elimination (RFE). RFE can eliminate redundancy among features, reduce feature dimension, and select optimal feature combination, and its main idea is to search feature subset from all features in training dataset in combination with regression model and iterate circularly, discard least important features in each iteration, and iterate on the rest features until all features are traversed, thus completing feature screening.
TABLE 1 original feature set
Figure BDA0004103141140000053
Figure BDA0004103141140000061
In a specific embodiment, the initial set of multivariate features is shown in table 1. And by combining with feature engineering, a feature set with high relevance to load data is screened and constructed, so that the difficulty of learning tasks can be reduced, and the generalization capability of the model can be enhanced.
Step S30, taking the multivariable feature set as an input of the time convolution network.
The time convolution network (Temporal Convolutional Network, TCN) is an improvement of the CNN network, can effectively process sequence modeling tasks, keeps more expansion memory, and has overall performance superior to that of the network such as RNN, LSTM and the like. Compared with one-dimensional convolution, TCN realizes extraction of sequence features by adding causal convolution and hole convolution. Meanwhile, residual connection is used among all network layers, so that the gradient problem is avoided. The TCN model is a simple and universal convolutional neural network architecture for solving the time series problem. The TCN model consists of a group of residual units, each residual unit is a small neural network with residual connection, and feedback and convergence of a deep network can be quickened through the residual connection, so that the degradation phenomenon caused by the increase of network layers is solved.
In this embodiment, the learned feature when the input is x is denoted as h (x) for one stacked layer structure (several layers are stacked), so that it can learn the residual unit as shown in the following formula 3:
f TCN (x)=h(x)-x (3)
the original learning feature is h (x). Residual learning is easier to implement than original feature direct learning.
The residual unit contains 2 convolution units and a nonlinear mapping. The nonlinear mapping is to reduce the dimension of data of high dimension when the input and output of the residual unit have different dimensions. First, one-dimensional expansion causal convolution is carried out in a convolution unit, wherein the operation formula of the one-dimensional expansion causal convolution is shown as a formula (4).
Figure BDA0004103141140000062
Wherein f is a filter; d is the expansion coefficient; k is the convolution kernel size; s is input time sequence information; s-d.i represents the past direction, s-d.i ensuring that only the past input can be convolved.
In a specific embodiment, a sampling interval is adjusted through an expansion coefficient, so that a larger Receptive Field (RF), namely a region range which can be seen by features on a convolution layer, is realized, a network can memorize history information which is long enough, and only input before t time is convolved to obtain output at t time, so that future information cannot be revealed; and then carrying out normalization processing on the weights to construct a new activation function, wherein the calculation process is shown in a formula (5).
Figure BDA0004103141140000071
The constructed activation function is more gentle, and smoothness plays an important role in optimization and generalization. Finally, maxDropout operation is adopted, the method is to a certain extent that the mixture of dropout and gauss dropout is executed on the maximum pooling layer, but a Bayesian method is used for achieving the purpose of preventing overfitting and accelerating model training speed, and the calculation process is shown in a formula (6).
Figure BDA0004103141140000072
Step S40, taking the output of the time convolution network as the input of the gating circulating unit.
In the embodiment, the gating cycle unit (GateRecurrentUnit, GRU) is used as a variant of LSTM, so that the problems of long LSTM training time and low training efficiency are well solved while the prediction accuracy is high. Wherein the door z is updated t Determining how much of the previous time gating structure remains to the current cell information, and resetting the gate r t Mainly determining how much past information needs to be forgotten, inputting the output result of the TCN into the GRU network to obtain the memory output h, and the calculation process is shown in formulas (7) - (10).
Figure BDA0004103141140000073
Figure BDA0004103141140000074
Figure BDA0004103141140000075
Figure BDA0004103141140000076
In the formulae (7) to (10), ω x,z ,ω h,z ,ω x,r ,ω x,g ,ω h,g Representing a parameter training matrix;
Figure BDA0004103141140000077
and->
Figure BDA0004103141140000078
Respectively representing the outputs of GRU networks at the current moment and the previous moment; sigma and tanh represent Sigmoid and tanh activation functions, g, respectively t Indicates the output gate +.>
Figure BDA0004103141140000079
The value +.about.at the current time is calculated by equation (10)>
Figure BDA0004103141140000081
And S50, taking the output of the gating circulating unit as the input of a memory mechanism to obtain an output power load prediction result.
In the deep learning field, a model generally needs to receive and process a large amount of data, however, at a specific moment, only a small part of data is often important, and Attention mechanisms (Attention) derive a set of weight coefficients through the autonomous learning capability of a network, concentrate Attention on important information, ignore or suppress unimportant information, so as to strengthen important features more effective for a prediction model. The attention-drawing mechanism can effectively pay attention to important information with great influence on the model, and the prediction accuracy is further improved.
In this embodiment, the calculation process of the network input of the GRU to the Attention mechanism is as shown in formulas (11) - (14).
Figure BDA0004103141140000082
Figure BDA0004103141140000083
Figure BDA0004103141140000084
Figure BDA0004103141140000085
In the formulas (11) - (14), S ti The attention scores calculated for the additive model, V, W, U, are all parameters that the additive model can learn in training; exp is an exponential function; according to the weight coefficient pair
Figure BDA0004103141140000086
And carrying out weighted summation to obtain F. In the embodiment, a TCN-GRU-Attention network model is built, the defect of a single network is overcome, the prediction error can be reduced, and better prediction can be generated than any single model.
The super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure.
The selection of the super parameters is crucial to the prediction effect and generalization capability of the model, the training of the neural network is influenced by the model structure, the data characterization and the optimization method, and the like, each link involves a plurality of parameters and the super parameters, the final effect of the model is determined by adjusting the parameters, and the common method is manual adjustment, but the method is time-consuming and labor-consuming, and the optimal solution is difficult to obtain, so that the prediction accuracy of the model is reduced. The Bayesian optimization algorithm (BayesianOptimization, BO) is one of the most commonly used algorithms for super-parametric optimization, and the traditional Bayesian optimization algorithm is based on a Gaussian process, has wide application, is excellent in a low-dimensional space, and is weak in a high-dimensional space.
In the embodiment, the super parameter optimization algorithm (Tree-structured Parzen Estimator, TPE) is utilized to optimize the super parameter in the model, so that the generalization capability of the model is enhanced, and the prediction precision of the model is improved.
An importance sampling algorithm is selected as a sampling function, and the calculation process is shown as a formula (15).
Figure BDA0004103141140000091
Given an observation value of x, p (x) is an x probability density function, q (x) is a probability density function with the same definition domain as p (x), let
Figure BDA0004103141140000092
w (x) is an importance weight, and the structural formula (16) is as follows.
E p(x) f(x)=E q(x) w(x)f(x) (16)
And extracting sample points from q (x) by adopting a law of large numbers, and estimating expectations. Bringing formula (16) into definition can give formula (17).
Figure BDA0004103141140000093
Samples can be decimated for the numerator and denominator using the proposed distribution q (x), with importance weights for the expectations. Q (x) is close to p (x) in the refusal of sampling, i.e
Figure BDA0004103141140000094
Is close to->
Figure BDA0004103141140000095
In the new proposed distribution, the extracted samples are only a multiple of the original proposed distribution, i.e. the extracted sample values of the normalized target distribution are +.>
Figure BDA0004103141140000096
Multiple times.
In the current observation, x with the greatest ratio of l (x)/g (x) is the required observation value. In each iteration, the corresponding hyper-parameter at which l (x)/g (x) is maximized is selected as the next set of search values, and the algorithm returns the point with the largest EI value. Compared with other Bayesian optimization algorithms, the TPE optimization algorithm based on the Gaussian mixture model has better effect under the condition of high-dimensional space, the optimization speed is obviously improved, and better results are obtained with higher efficiency. As shown in fig. 2, the TPE optimization flow is described as follows:
step 1: a domain of a hyper-parametric search space is defined from which hyper-parameters of an algorithm are selected.
Step 2: the prediction error is set as an objective function that receives the super-parameters and outputs the minimum prediction error required.
Step 3: a set of initialization hyper-parameters is randomly selected and several observations are obtained.
Step 4: TPE probability density estimation is performed, sample superparameters are extracted from l (x), an evaluation is made based on l (x)/g (x), and a set of minima is returned that yield at l (x)/g (x) that correspond to the maximum expected improvement.
Step 5: and determining the super-parameter combination with the EI maximum value as an optimal super-parameter combination, inputting the optimal super-parameter combination into a load prediction model for training, outputting a prediction result of the current model, and evaluating.
Step 6: and evaluating the prediction error, if the prediction accuracy is met, terminating the algorithm, otherwise, correcting the sampling function, and searching for the super-parameter combination again until the accuracy requirement is met.
In this embodiment, the TPE algorithm is used to optimize the super parameters in the model, and the optimizing ranges of the parameters are shown in table 2, where Filters, kernel _size and Dropoutrate represent the filter, the convolution kernel size and the discard rate, units, learning and batch_size represent the number of hidden neurons, the learning rate and the Batch size, respectively. The selection of the super parameters of the model is often determined by personal experience or repeated parameter adjustment, the selected super parameters may not be optimal, and the embodiment utilizes the TPE optimization algorithm to optimize the super parameters in the model, so that the prediction precision can be further improved.
Table 2 super parameter optimizing range
Figure BDA0004103141140000101
As shown in fig. 3, this embodiment proposes a power load prediction model based on TPE optimization TCN-GRU-Attention. The TCN is an improvement of the CNN network, can effectively process sequence modeling tasks, keeps more expansion memory, and has overall performance superior to that of networks such as RNN, LSTM and the like. Compared with one-dimensional convolution, TCN realizes extraction of sequence features by adding causal convolution and hole convolution. Meanwhile, residual connection is used among all network layers, so that the gradient problem is avoided. The TCN sequence modeling has several advantages:
(1) adaptive sequence model (causal convolution): the structure of causal convolution is unidirectional, which does not take future information into account, and the farther the historical information is traced back, the more hidden layers. The problem of the traditional CNN network still exists in the simple causal convolution, namely the modeling length of time is limited by the size of a convolution kernel;
(2) parallelism: the traditional RNN's prediction of subsequent time steps must be done after their previous time steps, i.e. must be processed sequentially, whereas long input sequences can be processed in parallel as a whole in the TCN;
(3) stable gradient: the reverse propagation path of TCN is different from the direction of the time series. Thus, TCN avoids the problem of gradient explosions or vanishing of RNN;
the accurate power load prediction is a necessary premise for constructing a novel and intelligent power system, so as to fully mine time sequence characteristics in power load data and further improve prediction accuracy, the embodiment provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, and the current TCN, GRU and Attention mechanisms which are widely applied are combined, so that the defects of a single model are overcome, and the advantages are complementary. In addition, the TPE hyper-parameter optimization algorithm is utilized to optimize the hyper-parameters in the model, so that the fitting property and the effectiveness of the model can be further improved, and the TPE hyper-parameter optimization algorithm has a certain practical significance.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims (8)

1. A short-term power load prediction method based on nonstandard bayesian algorithm optimization, comprising:
constructing a multivariable feature set;
preprocessing the multivariable feature set;
taking the multivariate feature set as an input to a time convolution network;
taking the output of the time convolution network as the input of a gating circulation unit;
taking the output of the gating circulation unit as the input of a memory mechanism to obtain an output power load prediction result;
the super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure.
2. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein the multivariate feature set comprises: at least one of historical load data, rainfall, wind speed, maximum temperature, minimum temperature, and humidity.
3. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein preprocessing the multivariate feature set comprises:
performing normalization operation and deletion value operation on the multivariate feature set to obtain a preprocessing feature set;
and on the basis of the primary screening result of the pearson correlation coefficient, reversely selecting the preprocessing feature set by utilizing recursive feature elimination, discarding the least important features in each iteration until all the features are traversed, and finishing feature screening.
4. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein the time convolution network comprises a set of residual units; each residual unit comprises 2 convolution units and a nonlinear mapping; the nonlinear mapping is used for performing dimension reduction output on high-dimension input data.
5. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 4, wherein the time convolution network further comprises:
adjusting the sampling interval by the expansion coefficient;
only convolving the input before the time t to obtain the output at the time t;
normalizing the weights by constructing a new activation function;
and smoothing the activation function by adopting a Max Dropout algorithm.
6. The non-standard bayesian-algorithm-based optimized short-term power load prediction method according to claim 1, wherein the tree-structure-based non-standard bayesian optimization algorithm comprises:
an importance sampling algorithm is selected as a sampling function;
constructing an importance weight function by giving an observation value and setting an importance weight;
extracting sample points based on a law of large numbers;
and acquiring an importance weight expected value.
7. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 6, further comprising:
in the current observation, taking the hyper-parameters corresponding to the observed values when the ratio between the two output results of the sampling function is maximized as the search value of the next group.
8. The method for short-term power load prediction based on nonstandard bayesian algorithm optimization according to claim 7, wherein the flow of the nonstandard bayesian optimization algorithm based on a tree structure comprises:
defining a super-parameter search space domain;
selecting a hyper-parametric input from the hyper-parametric search space domain;
setting a prediction error as an objective function;
randomly selecting a group of initialization super parameters and obtaining corresponding observation values;
carrying out probability density estimation of a nonstandard Bayesian optimization algorithm;
extracting a sample super-parameter from the sampling function output result and evaluating to obtain a set of minimum values corresponding to the maximum expected improvement;
determining a super-parameter combination which enables the entropy index to be maximum as an optimal super-parameter combination;
inputting the optimal super-parameter combination into a load prediction model for training;
evaluating the prediction result of the load prediction model;
if the prediction error of the prediction result accords with the set prediction precision condition, the training is terminated;
otherwise, correcting the sampling function, and searching for the super-parameter combination again until the prediction accuracy condition is met.
CN202310184153.5A 2023-03-01 2023-03-01 Short-term power load prediction method based on nonstandard Bayesian algorithm optimization Pending CN116316573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310184153.5A CN116316573A (en) 2023-03-01 2023-03-01 Short-term power load prediction method based on nonstandard Bayesian algorithm optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310184153.5A CN116316573A (en) 2023-03-01 2023-03-01 Short-term power load prediction method based on nonstandard Bayesian algorithm optimization

Publications (1)

Publication Number Publication Date
CN116316573A true CN116316573A (en) 2023-06-23

Family

ID=86831802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310184153.5A Pending CN116316573A (en) 2023-03-01 2023-03-01 Short-term power load prediction method based on nonstandard Bayesian algorithm optimization

Country Status (1)

Country Link
CN (1) CN116316573A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116882589A (en) * 2023-09-04 2023-10-13 国网天津市电力公司营销服务中心 Online line loss rate prediction method based on Bayesian optimization deep neural network
CN117526316A (en) * 2024-01-04 2024-02-06 国网湖北省电力有限公司 Load prediction method based on GCN-CBAM-BiGRU combined model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116882589A (en) * 2023-09-04 2023-10-13 国网天津市电力公司营销服务中心 Online line loss rate prediction method based on Bayesian optimization deep neural network
CN117526316A (en) * 2024-01-04 2024-02-06 国网湖北省电力有限公司 Load prediction method based on GCN-CBAM-BiGRU combined model

Similar Documents

Publication Publication Date Title
CN108448610B (en) Short-term wind power prediction method based on deep learning
CN116316573A (en) Short-term power load prediction method based on nonstandard Bayesian algorithm optimization
CN115018021B (en) Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism
CN111768000A (en) Industrial process data modeling method for online adaptive fine-tuning deep learning
CN110987436B (en) Bearing fault diagnosis method based on excitation mechanism
CN114363195B (en) Network flow prediction and early warning method for time and frequency spectrum residual convolution network
CN112686376A (en) Node representation method based on timing diagram neural network and incremental learning method
CN111222689A (en) LSTM load prediction method, medium, and electronic device based on multi-scale temporal features
CN117114160A (en) Short-term photovoltaic power prediction method
CN116245019A (en) Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm
CN116384576A (en) Wind speed prediction method, device, system and storage medium
Chen et al. Short-term Wind Speed Forecasting Based on Singular Spectrum Analysis, Fuzzy C-Means Clustering, and Improved POABP
CN113128666A (en) Mo-S-LSTMs model-based time series multi-step prediction method
CN116885699A (en) Power load prediction method based on dual-attention mechanism
CN115809725A (en) Multi-factor short-term electric quantity prediction method and device
CN113433514B (en) Parameter self-learning interference suppression method based on expanded deep network
CN113296947B (en) Resource demand prediction method based on improved XGBoost model
CN113743018A (en) EEMD-FOA-GRNN-based time sequence prediction method
CN113743652B (en) Sugarcane squeezing process prediction method based on depth feature recognition
CN113468740B (en) Soft measurement modeling method based on cooperative noise sharing
Linzi Short-term power load forecasting based on FA-LSTM with similar day selection
Chen Stock Price Prediction Based on the Fusion of CNN-GRU Combined Neural Network and Attention Mechanism
Li et al. Efficient Time Series Predicting with Feature Selection and Temporal Convolutional Network
Fan Wind speed prediction using ensemble and optimizer strategy
CN114692896A (en) Base model pool generation method based on multi-objective evolutionary optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination