CN116316573A - Short-term power load prediction method based on nonstandard Bayesian algorithm optimization - Google Patents
Short-term power load prediction method based on nonstandard Bayesian algorithm optimization Download PDFInfo
- Publication number
- CN116316573A CN116316573A CN202310184153.5A CN202310184153A CN116316573A CN 116316573 A CN116316573 A CN 116316573A CN 202310184153 A CN202310184153 A CN 202310184153A CN 116316573 A CN116316573 A CN 116316573A
- Authority
- CN
- China
- Prior art keywords
- super
- power load
- load prediction
- algorithm
- optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000007246 mechanism Effects 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 26
- 238000005070 sampling Methods 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 6
- 230000006872 improvement Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims description 2
- 229920002725 thermoplastic elastomer Polymers 0.000 abstract description 18
- 230000007547 defect Effects 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 11
- 230000001364 causal effect Effects 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013398 bayesian method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/003—Load forecast, e.g. methods or systems for forecasting future load demand
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Power Engineering (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Marketing (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which comprises the following steps: constructing a multivariable feature set; preprocessing the multivariable feature set; taking the multivariable feature set as an input of the time convolution network; taking the output of the time convolution network as the input of the gating circulating unit; taking the output of the gating circulation unit as the input of a memory mechanism to obtain an output power load prediction result; the super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure. The embodiment of the invention provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, which not only overcomes the defect of a single model, but also optimizes the super parameters in the model by using a TPE optimization algorithm, thereby further improving the effectiveness and adaptability of the model.
Description
Technical Field
The invention relates to the technical field of power systems, in particular to a short-term power load prediction method based on nonstandard Bayesian algorithm optimization.
Background
Currently, the power industry in China rapidly develops, the power grid supply is continuously enlarged, the power system structure and the operation mode of the power system are increasingly diversified, and the safety, the reliability and the stable operation of the system of the power grid face greater challenges under the new background. Short-term power load prediction is an important work of a power management department, and accurate load prediction can reasonably arrange a production plan and ensure the real-time performance and economy of power scheduling, so that the power generation cost is reduced.
Improving the accuracy of short-term power load prediction not only can improve the overall economic benefit of the power system, but also can enhance the stability and safety of the system operation. With the continuous development of a power system, the difficulty of short-term power load prediction is greatly increased, and the provision of a reasonable and effective prediction model becomes an important point of current research. Although convolutional neural networks (Convolutional Neural Network, CNN) are applicable to load data prediction of multidimensional variables, they are not fully applicable to learning time series and require various auxiliary processes. The recurrent neural network (Recurrent Neural Network, RNN) has a good effect in processing time series data, but it has a serious gradient problem in the case of long time series input. Compared with RNN, LSTM network is more effective in processing long time sequence prediction problem, but LSTM three gating unit parameters are more, and the network has the problems of longer training time and lower training efficiency.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which aims to solve the problems of longer training time and lower training efficiency of a prediction model adopted for short-term power load prediction in the prior art.
The embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which comprises the following steps:
constructing a multivariable feature set;
preprocessing the multivariable feature set;
taking the multivariable feature set as an input of the time convolution network;
taking the output of the time convolution network as the input of the gating circulating unit;
taking the output of the gating circulation unit as the input of a memory mechanism to obtain an output power load prediction result;
the super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure.
Optionally, the multivariate feature set comprises: at least one of historical load data, rainfall, wind speed, maximum temperature, minimum temperature, and humidity.
Optionally, preprocessing the multivariate feature set comprises:
performing normalization operation and deletion value operation on the multivariable feature set to obtain a preprocessing feature set;
and on the basis of the primary screening result of the pearson correlation coefficient, reversely selecting the preprocessing feature set by utilizing recursive feature elimination, discarding the least important features in each iteration until all the features are traversed, and finishing feature screening.
Optionally, the time convolution network comprises a set of residual units; each residual unit comprises 2 convolution units and a nonlinear mapping; the nonlinear mapping is used for performing dimension reduction output on high-dimension input data.
Optionally, the time convolution network further comprises:
adjusting the sampling interval by the expansion coefficient;
only convolving the input before the time t to obtain the output at the time t;
normalizing the weights by constructing a new activation function;
and smoothing the activation function by adopting a MaxPropout algorithm.
Optionally, the non-standard bayesian optimization algorithm based on the tree structure comprises:
an importance sampling algorithm is selected as a sampling function;
constructing an importance weight function by giving an observation value and setting an importance weight;
extracting sample points based on a law of large numbers;
and acquiring an importance weight expected value.
Optionally, the method further comprises:
in the current observation, the hyper-parameters corresponding to the observed values when the ratio between the two output results of the sampling function is maximized are used as the search values of the next group.
Optionally, the flow of the non-standard bayesian optimization algorithm based on the tree structure comprises the following steps:
defining a super-parameter search space domain;
selecting a hyper-parameter input from a hyper-parameter search space domain;
setting a prediction error as an objective function;
randomly selecting a group of initialization super parameters and obtaining corresponding observation values;
carrying out probability density estimation of a nonstandard Bayesian optimization algorithm;
extracting a sample super-parameter from the sampling function output result and evaluating to obtain a set of minimum values corresponding to the maximum expected improvement;
determining a super-parameter combination which enables the entropy index to be maximum as an optimal super-parameter combination;
inputting the optimal super-parameter combination into a load prediction model for training;
evaluating the prediction result of the load prediction model;
if the prediction error of the prediction result accords with the set prediction precision condition, the training is terminated;
otherwise, the sampling function is corrected, and the super-parameter combination is searched again until the prediction accuracy condition is met.
The embodiment of the invention has the beneficial effects that:
1. the embodiment of the invention provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, which not only overcomes the defect of a single model, but also optimizes the super parameters in the model by using a TPE optimization algorithm, thereby further improving the effectiveness and adaptability of the model.
2. The prediction model provided by the embodiment of the invention fully combines the TCN extraction time sequence feature capability, GRU nonlinear fitting capability and attribute screening capability, realizes optimal combination, can effectively improve the accuracy of short-term power load prediction, constructs a new activation function, is more gentle, and plays an important role in optimization and generalization. On the basis, the TPE is utilized to optimize the super parameters in the model, so that the model has higher prediction precision and better fitness.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:
FIG. 1 illustrates a flow chart of a short-term power load prediction method based on nonstandard Bayesian algorithm optimization in an embodiment of the present invention;
FIG. 2 shows a flow chart of a super-parameter optimization algorithm in an embodiment of the invention;
FIG. 3 illustrates a short-term power load prediction model structure based on nonstandard Bayesian algorithm optimization in accordance with an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides a short-term power load prediction method based on nonstandard Bayesian algorithm optimization, which is shown in fig. 1 and comprises the following steps:
and step S10, constructing a multi-variable feature set.
In this embodiment, since the power load is affected by various factors, historical load data of a region required for short-term power load prediction and meteorological factors such as rainfall, wind speed, maximum temperature, minimum temperature and humidity are collected, and a multivariate feature set is constructed.
And step S20, preprocessing the multivariable feature set.
In this embodiment, since the magnitude difference between the historical power load data and the feature subset is large and there are problems such as missing values, the model cannot effectively identify and extract information from the data, so that the quality of the model is affected, and therefore, the data is preprocessed such as normalization and missing values deletion.
And by combining with feature engineering, a feature set with high relevance to load data is screened and constructed, so that the difficulty of learning tasks can be reduced, and the generalization capability of the model can be enhanced.
In a specific embodiment, the data is preprocessed by maximum and minimum normalization, the calculation process is as shown in formula (1), X represents a certain variable of the original data, and X represents new normalized data.
The effectiveness and generalization capability of the prediction model fundamentally depend on the excellent of the load data and the related features, so that the generalization capability of the model can be enhanced by selecting the features with high relevance to the load data, and the efficiency of the model is improved; the feature with weak relevance to the load data is screened out, so that the feature quantity can be reduced, and the difficulty of learning tasks is reduced. Pearson correlation coefficients are widely used to measure the correlation between two variables, and assuming that another variable Y exists, the normalized new data Y is calculated by equation 1, and the pearson correlation coefficient is calculated as shown in equation (2).
The pearson correlation coefficient is between 0.0 and 0.2 to represent extremely weak correlation or no correlation, and firstly, the pearson correlation coefficient is used for eliminating part of characteristic factors with extremely weak correlation or irrelevant from a plurality of characteristics, so that the characteristic primary screening is completed. On the primary screening result of pearson correlation coefficient, the feature is reversely selected by using Recursive Feature Elimination (RFE). RFE can eliminate redundancy among features, reduce feature dimension, and select optimal feature combination, and its main idea is to search feature subset from all features in training dataset in combination with regression model and iterate circularly, discard least important features in each iteration, and iterate on the rest features until all features are traversed, thus completing feature screening.
TABLE 1 original feature set
In a specific embodiment, the initial set of multivariate features is shown in table 1. And by combining with feature engineering, a feature set with high relevance to load data is screened and constructed, so that the difficulty of learning tasks can be reduced, and the generalization capability of the model can be enhanced.
Step S30, taking the multivariable feature set as an input of the time convolution network.
The time convolution network (Temporal Convolutional Network, TCN) is an improvement of the CNN network, can effectively process sequence modeling tasks, keeps more expansion memory, and has overall performance superior to that of the network such as RNN, LSTM and the like. Compared with one-dimensional convolution, TCN realizes extraction of sequence features by adding causal convolution and hole convolution. Meanwhile, residual connection is used among all network layers, so that the gradient problem is avoided. The TCN model is a simple and universal convolutional neural network architecture for solving the time series problem. The TCN model consists of a group of residual units, each residual unit is a small neural network with residual connection, and feedback and convergence of a deep network can be quickened through the residual connection, so that the degradation phenomenon caused by the increase of network layers is solved.
In this embodiment, the learned feature when the input is x is denoted as h (x) for one stacked layer structure (several layers are stacked), so that it can learn the residual unit as shown in the following formula 3:
f TCN (x)=h(x)-x (3)
the original learning feature is h (x). Residual learning is easier to implement than original feature direct learning.
The residual unit contains 2 convolution units and a nonlinear mapping. The nonlinear mapping is to reduce the dimension of data of high dimension when the input and output of the residual unit have different dimensions. First, one-dimensional expansion causal convolution is carried out in a convolution unit, wherein the operation formula of the one-dimensional expansion causal convolution is shown as a formula (4).
Wherein f is a filter; d is the expansion coefficient; k is the convolution kernel size; s is input time sequence information; s-d.i represents the past direction, s-d.i ensuring that only the past input can be convolved.
In a specific embodiment, a sampling interval is adjusted through an expansion coefficient, so that a larger Receptive Field (RF), namely a region range which can be seen by features on a convolution layer, is realized, a network can memorize history information which is long enough, and only input before t time is convolved to obtain output at t time, so that future information cannot be revealed; and then carrying out normalization processing on the weights to construct a new activation function, wherein the calculation process is shown in a formula (5).
The constructed activation function is more gentle, and smoothness plays an important role in optimization and generalization. Finally, maxDropout operation is adopted, the method is to a certain extent that the mixture of dropout and gauss dropout is executed on the maximum pooling layer, but a Bayesian method is used for achieving the purpose of preventing overfitting and accelerating model training speed, and the calculation process is shown in a formula (6).
Step S40, taking the output of the time convolution network as the input of the gating circulating unit.
In the embodiment, the gating cycle unit (GateRecurrentUnit, GRU) is used as a variant of LSTM, so that the problems of long LSTM training time and low training efficiency are well solved while the prediction accuracy is high. Wherein the door z is updated t Determining how much of the previous time gating structure remains to the current cell information, and resetting the gate r t Mainly determining how much past information needs to be forgotten, inputting the output result of the TCN into the GRU network to obtain the memory output h, and the calculation process is shown in formulas (7) - (10).
In the formulae (7) to (10), ω x,z ,ω h,z ,ω x,r ,ω x,g ,ω h,g Representing a parameter training matrix;and->Respectively representing the outputs of GRU networks at the current moment and the previous moment; sigma and tanh represent Sigmoid and tanh activation functions, g, respectively t Indicates the output gate +.>The value +.about.at the current time is calculated by equation (10)>
And S50, taking the output of the gating circulating unit as the input of a memory mechanism to obtain an output power load prediction result.
In the deep learning field, a model generally needs to receive and process a large amount of data, however, at a specific moment, only a small part of data is often important, and Attention mechanisms (Attention) derive a set of weight coefficients through the autonomous learning capability of a network, concentrate Attention on important information, ignore or suppress unimportant information, so as to strengthen important features more effective for a prediction model. The attention-drawing mechanism can effectively pay attention to important information with great influence on the model, and the prediction accuracy is further improved.
In this embodiment, the calculation process of the network input of the GRU to the Attention mechanism is as shown in formulas (11) - (14).
In the formulas (11) - (14), S ti The attention scores calculated for the additive model, V, W, U, are all parameters that the additive model can learn in training; exp is an exponential function; according to the weight coefficient pairAnd carrying out weighted summation to obtain F. In the embodiment, a TCN-GRU-Attention network model is built, the defect of a single network is overcome, the prediction error can be reduced, and better prediction can be generated than any single model.
The super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure.
The selection of the super parameters is crucial to the prediction effect and generalization capability of the model, the training of the neural network is influenced by the model structure, the data characterization and the optimization method, and the like, each link involves a plurality of parameters and the super parameters, the final effect of the model is determined by adjusting the parameters, and the common method is manual adjustment, but the method is time-consuming and labor-consuming, and the optimal solution is difficult to obtain, so that the prediction accuracy of the model is reduced. The Bayesian optimization algorithm (BayesianOptimization, BO) is one of the most commonly used algorithms for super-parametric optimization, and the traditional Bayesian optimization algorithm is based on a Gaussian process, has wide application, is excellent in a low-dimensional space, and is weak in a high-dimensional space.
In the embodiment, the super parameter optimization algorithm (Tree-structured Parzen Estimator, TPE) is utilized to optimize the super parameter in the model, so that the generalization capability of the model is enhanced, and the prediction precision of the model is improved.
An importance sampling algorithm is selected as a sampling function, and the calculation process is shown as a formula (15).
Given an observation value of x, p (x) is an x probability density function, q (x) is a probability density function with the same definition domain as p (x), letw (x) is an importance weight, and the structural formula (16) is as follows.
E p(x) f(x)=E q(x) w(x)f(x) (16)
And extracting sample points from q (x) by adopting a law of large numbers, and estimating expectations. Bringing formula (16) into definition can give formula (17).
Samples can be decimated for the numerator and denominator using the proposed distribution q (x), with importance weights for the expectations. Q (x) is close to p (x) in the refusal of sampling, i.eIs close to->In the new proposed distribution, the extracted samples are only a multiple of the original proposed distribution, i.e. the extracted sample values of the normalized target distribution are +.>Multiple times.
In the current observation, x with the greatest ratio of l (x)/g (x) is the required observation value. In each iteration, the corresponding hyper-parameter at which l (x)/g (x) is maximized is selected as the next set of search values, and the algorithm returns the point with the largest EI value. Compared with other Bayesian optimization algorithms, the TPE optimization algorithm based on the Gaussian mixture model has better effect under the condition of high-dimensional space, the optimization speed is obviously improved, and better results are obtained with higher efficiency. As shown in fig. 2, the TPE optimization flow is described as follows:
step 1: a domain of a hyper-parametric search space is defined from which hyper-parameters of an algorithm are selected.
Step 2: the prediction error is set as an objective function that receives the super-parameters and outputs the minimum prediction error required.
Step 3: a set of initialization hyper-parameters is randomly selected and several observations are obtained.
Step 4: TPE probability density estimation is performed, sample superparameters are extracted from l (x), an evaluation is made based on l (x)/g (x), and a set of minima is returned that yield at l (x)/g (x) that correspond to the maximum expected improvement.
Step 5: and determining the super-parameter combination with the EI maximum value as an optimal super-parameter combination, inputting the optimal super-parameter combination into a load prediction model for training, outputting a prediction result of the current model, and evaluating.
Step 6: and evaluating the prediction error, if the prediction accuracy is met, terminating the algorithm, otherwise, correcting the sampling function, and searching for the super-parameter combination again until the accuracy requirement is met.
In this embodiment, the TPE algorithm is used to optimize the super parameters in the model, and the optimizing ranges of the parameters are shown in table 2, where Filters, kernel _size and Dropoutrate represent the filter, the convolution kernel size and the discard rate, units, learning and batch_size represent the number of hidden neurons, the learning rate and the Batch size, respectively. The selection of the super parameters of the model is often determined by personal experience or repeated parameter adjustment, the selected super parameters may not be optimal, and the embodiment utilizes the TPE optimization algorithm to optimize the super parameters in the model, so that the prediction precision can be further improved.
Table 2 super parameter optimizing range
As shown in fig. 3, this embodiment proposes a power load prediction model based on TPE optimization TCN-GRU-Attention. The TCN is an improvement of the CNN network, can effectively process sequence modeling tasks, keeps more expansion memory, and has overall performance superior to that of networks such as RNN, LSTM and the like. Compared with one-dimensional convolution, TCN realizes extraction of sequence features by adding causal convolution and hole convolution. Meanwhile, residual connection is used among all network layers, so that the gradient problem is avoided. The TCN sequence modeling has several advantages:
(1) adaptive sequence model (causal convolution): the structure of causal convolution is unidirectional, which does not take future information into account, and the farther the historical information is traced back, the more hidden layers. The problem of the traditional CNN network still exists in the simple causal convolution, namely the modeling length of time is limited by the size of a convolution kernel;
(2) parallelism: the traditional RNN's prediction of subsequent time steps must be done after their previous time steps, i.e. must be processed sequentially, whereas long input sequences can be processed in parallel as a whole in the TCN;
(3) stable gradient: the reverse propagation path of TCN is different from the direction of the time series. Thus, TCN avoids the problem of gradient explosions or vanishing of RNN;
the accurate power load prediction is a necessary premise for constructing a novel and intelligent power system, so as to fully mine time sequence characteristics in power load data and further improve prediction accuracy, the embodiment provides a power load prediction model based on TPE (thermoplastic elastomer) optimization TCN-GRU-Attention, and the current TCN, GRU and Attention mechanisms which are widely applied are combined, so that the defects of a single model are overcome, and the advantages are complementary. In addition, the TPE hyper-parameter optimization algorithm is utilized to optimize the hyper-parameters in the model, so that the fitting property and the effectiveness of the model can be further improved, and the TPE hyper-parameter optimization algorithm has a certain practical significance.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.
Claims (8)
1. A short-term power load prediction method based on nonstandard bayesian algorithm optimization, comprising:
constructing a multivariable feature set;
preprocessing the multivariable feature set;
taking the multivariate feature set as an input to a time convolution network;
taking the output of the time convolution network as the input of a gating circulation unit;
taking the output of the gating circulation unit as the input of a memory mechanism to obtain an output power load prediction result;
the super-parameters of the time convolution network, the super-parameters of the gating circulation unit and the super-parameters of the memory mechanism are obtained through optimization of a non-standard Bayesian optimization algorithm based on a tree structure.
2. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein the multivariate feature set comprises: at least one of historical load data, rainfall, wind speed, maximum temperature, minimum temperature, and humidity.
3. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein preprocessing the multivariate feature set comprises:
performing normalization operation and deletion value operation on the multivariate feature set to obtain a preprocessing feature set;
and on the basis of the primary screening result of the pearson correlation coefficient, reversely selecting the preprocessing feature set by utilizing recursive feature elimination, discarding the least important features in each iteration until all the features are traversed, and finishing feature screening.
4. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 1, wherein the time convolution network comprises a set of residual units; each residual unit comprises 2 convolution units and a nonlinear mapping; the nonlinear mapping is used for performing dimension reduction output on high-dimension input data.
5. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 4, wherein the time convolution network further comprises:
adjusting the sampling interval by the expansion coefficient;
only convolving the input before the time t to obtain the output at the time t;
normalizing the weights by constructing a new activation function;
and smoothing the activation function by adopting a Max Dropout algorithm.
6. The non-standard bayesian-algorithm-based optimized short-term power load prediction method according to claim 1, wherein the tree-structure-based non-standard bayesian optimization algorithm comprises:
an importance sampling algorithm is selected as a sampling function;
constructing an importance weight function by giving an observation value and setting an importance weight;
extracting sample points based on a law of large numbers;
and acquiring an importance weight expected value.
7. The non-standard bayesian algorithm optimized short-term power load prediction method according to claim 6, further comprising:
in the current observation, taking the hyper-parameters corresponding to the observed values when the ratio between the two output results of the sampling function is maximized as the search value of the next group.
8. The method for short-term power load prediction based on nonstandard bayesian algorithm optimization according to claim 7, wherein the flow of the nonstandard bayesian optimization algorithm based on a tree structure comprises:
defining a super-parameter search space domain;
selecting a hyper-parametric input from the hyper-parametric search space domain;
setting a prediction error as an objective function;
randomly selecting a group of initialization super parameters and obtaining corresponding observation values;
carrying out probability density estimation of a nonstandard Bayesian optimization algorithm;
extracting a sample super-parameter from the sampling function output result and evaluating to obtain a set of minimum values corresponding to the maximum expected improvement;
determining a super-parameter combination which enables the entropy index to be maximum as an optimal super-parameter combination;
inputting the optimal super-parameter combination into a load prediction model for training;
evaluating the prediction result of the load prediction model;
if the prediction error of the prediction result accords with the set prediction precision condition, the training is terminated;
otherwise, correcting the sampling function, and searching for the super-parameter combination again until the prediction accuracy condition is met.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310184153.5A CN116316573A (en) | 2023-03-01 | 2023-03-01 | Short-term power load prediction method based on nonstandard Bayesian algorithm optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310184153.5A CN116316573A (en) | 2023-03-01 | 2023-03-01 | Short-term power load prediction method based on nonstandard Bayesian algorithm optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116316573A true CN116316573A (en) | 2023-06-23 |
Family
ID=86831802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310184153.5A Pending CN116316573A (en) | 2023-03-01 | 2023-03-01 | Short-term power load prediction method based on nonstandard Bayesian algorithm optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116316573A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116882589A (en) * | 2023-09-04 | 2023-10-13 | 国网天津市电力公司营销服务中心 | Online line loss rate prediction method based on Bayesian optimization deep neural network |
CN117526316A (en) * | 2024-01-04 | 2024-02-06 | 国网湖北省电力有限公司 | Load prediction method based on GCN-CBAM-BiGRU combined model |
-
2023
- 2023-03-01 CN CN202310184153.5A patent/CN116316573A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116882589A (en) * | 2023-09-04 | 2023-10-13 | 国网天津市电力公司营销服务中心 | Online line loss rate prediction method based on Bayesian optimization deep neural network |
CN117526316A (en) * | 2024-01-04 | 2024-02-06 | 国网湖北省电力有限公司 | Load prediction method based on GCN-CBAM-BiGRU combined model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108448610B (en) | Short-term wind power prediction method based on deep learning | |
CN116316573A (en) | Short-term power load prediction method based on nonstandard Bayesian algorithm optimization | |
CN115018021B (en) | Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism | |
CN111768000A (en) | Industrial process data modeling method for online adaptive fine-tuning deep learning | |
CN110987436B (en) | Bearing fault diagnosis method based on excitation mechanism | |
CN114363195B (en) | Network flow prediction and early warning method for time and frequency spectrum residual convolution network | |
CN112686376A (en) | Node representation method based on timing diagram neural network and incremental learning method | |
CN111222689A (en) | LSTM load prediction method, medium, and electronic device based on multi-scale temporal features | |
CN117114160A (en) | Short-term photovoltaic power prediction method | |
CN116245019A (en) | Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm | |
CN116384576A (en) | Wind speed prediction method, device, system and storage medium | |
Chen et al. | Short-term Wind Speed Forecasting Based on Singular Spectrum Analysis, Fuzzy C-Means Clustering, and Improved POABP | |
CN113128666A (en) | Mo-S-LSTMs model-based time series multi-step prediction method | |
CN116885699A (en) | Power load prediction method based on dual-attention mechanism | |
CN115809725A (en) | Multi-factor short-term electric quantity prediction method and device | |
CN113433514B (en) | Parameter self-learning interference suppression method based on expanded deep network | |
CN113296947B (en) | Resource demand prediction method based on improved XGBoost model | |
CN113743018A (en) | EEMD-FOA-GRNN-based time sequence prediction method | |
CN113743652B (en) | Sugarcane squeezing process prediction method based on depth feature recognition | |
CN113468740B (en) | Soft measurement modeling method based on cooperative noise sharing | |
Linzi | Short-term power load forecasting based on FA-LSTM with similar day selection | |
Chen | Stock Price Prediction Based on the Fusion of CNN-GRU Combined Neural Network and Attention Mechanism | |
Li et al. | Efficient Time Series Predicting with Feature Selection and Temporal Convolutional Network | |
Fan | Wind speed prediction using ensemble and optimizer strategy | |
CN114692896A (en) | Base model pool generation method based on multi-objective evolutionary optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |