CN114358389A

CN114358389A - Short-term power load prediction method combining VMD decomposition and time convolution network

Info

Publication number: CN114358389A
Application number: CN202111520535.8A
Authority: CN
Inventors: 唐贤伦; 陈洪旭; 万辉; 谢涛; 罗洪平; 黄淼; 邹密
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-04-15

Abstract

The invention requests to protect a short-term power load prediction method combining VMD decomposition and a time convolution network. Then, respectively calculating the Sample Entropy (SE) of each IMF component, combining the components close to the sample entropy into a new sequence to reduce the number of models needing to be trained, finally fitting the nonlinear relation between the historical data and the predicted data of each sequence by using a Time Convolution Network (TCN), and overlapping the prediction results of each model to obtain the final predicted value. Compared with other traditional load prediction methods, the method has higher prediction accuracy.

Description

Short-term power load prediction method combining VMD decomposition and time convolution network

Technical Field

The invention belongs to the technical field of short-term power load prediction methods, and particularly relates to a short-term power load prediction method combining a variational modal signal decomposition processing technology and a time convolution network prediction model.

Background

With the continuous development of social economy, the demand of people for energy is continuously expanded, so that the structure and the operation mode of a power system are diversified. Since the electric energy cannot be stored on a large scale at present, and the variation of the electric load has uncertainty, the problem of load prediction of the electric power system gradually starts to be concerned, and related theories and technologies are continuously developed and updated. The power system short-term load prediction can be used for making a power generation plan, implementing a scheduling decision plan and the like, and in a certain sense, the power system short-term load prediction has great help for safe, reliable, economical and stable operation of a power grid.

Currently, there are two main methods for short-term load prediction: classical prediction methods and intelligent prediction methods. The classical prediction method refers to a traditional prediction method applying statistical theory, and the most common method is a time series method. The classical time series method is small in required data volume and easy to implement, but load prediction is often influenced by various external factors, the time series method can only predict future load data based on historical load data, external factor features cannot be added into input data, the defect that input data with multiple features cannot be predicted exists, and the improvement of prediction accuracy is limited. In recent years, due to the rapid development of deep learning technology, artificial intelligence and machine learning methods are increasingly used in short-term load prediction, and typically, a large amount of power load data is used for training an artificial neural network to construct a prediction model. Such as a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a long-term and short-term memory neural network (LSTM), etc., although the intelligent prediction model can effectively fit the non-linear relationship between the historical data and the external factors, there are still many disadvantages that the model falls into local optimality, depends on manual adjustment parameters, and is easy to over-fit, etc.

Therefore, a prediction method combining signal processing and a neural network is needed, which can overcome the defects that the load original sequence is not sufficiently processed in the traditional method, and the load sequence is directly used for constructing a prediction model, and can decompose and analyze the original load sequence to extract useful characteristics. Meanwhile, in the decomposition process, the occurrence of sequences with similar modal complexity is considered, if all the sequences are used for training the models, a plurality of prediction models need to be constructed, the time and the difficulty of model training are increased, and based on the consideration, the sequences with high similarity are combined into a new sequence by using the sample entropy, so that the number of the models needing to be trained is reduced, and the training speed is accelerated. The decomposed sequence can effectively utilize the characteristic of strong nonlinear fitting capability of the neural network, and the load prediction accuracy is improved.

The data after the variational mode decomposition is directly used for training the prediction models in the application publication number CN113240193A, a plurality of prediction models need to be built, the time and the difficulty of model training are increased, the sequences with high similarity are combined into a new sequence by using the sample entropy based on the consideration, the number of the models needing to be trained is reduced, and the training speed is accelerated.

In application publication No. CN109543901A, the idea of signal decomposition is not used to analyze the power load data, and the load data itself is a time series signal data, which can decompose and analyze the original load sequence to extract useful features, thereby improving the prediction accuracy of the model.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A short-term power load prediction method combining VMD decomposition and a time convolution network is provided, which improves the accuracy and generalization capability of load prediction. The technical scheme of the invention is as follows:

a method of short term power load prediction in conjunction with VMD decomposition and a time convolution network, comprising the steps of:

carrying out data preprocessing including data cleaning and normalization on the original power load data;

decomposing the load sequence by adopting a variational mode VMD to obtain a decomposed sequence;

calculating sample entropies of all decomposition sequences, and combining modal components with similar sample entropy values to form a new component;

carrying out normalization processing on the combined new components, mapping the load data between [0,1], and constructing a time sequence input and output label pair by using a sliding window, wherein the input and output label pair is used for training a model;

and constructing a short-term power load prediction model of the time convolution network, adjusting network weight parameters by adopting an Adam optimizer, and searching a network optimal value.

Further, the decomposing sequence is obtained by performing variable mode VMD decomposition on the load sequence, and specifically includes the following steps:

the realization of the variational modal decomposition can be divided into two steps, namely constructing a variational problem and solving the variational problem; in the structural variation problem, firstly, a single-side frequency spectrum of each sub-signal is obtained based on Hilbert transform, and then, a central frequency omega corresponding to each sub-signal_kAnd (3) performing aliasing on the exponential terms, modulating the frequency spectrum of the subsequence to a base band, and finally, estimating the bandwidth of the demodulated signal by using Gaussian smoothing, and finally converting the bandwidth into a variation problem of solving band constraint.

Further, the complete process of the variational modal decomposition is as follows:

(1) initialization

Respectively representing the k-th modal component and the center frequency,

for lagrange operators, the upper left-hand digit 1 represents the first iteration. N is set to 0.

(2) For each subsequence, the updating is continuously carried out according to (3) and (4)

And

in the formula:

for the wiener filtering of the current residual component,

is the frequency center of the corresponding modal component, and omega is the frequency value;

respectively represent the original sequences f (t), and

fourier transform of (a) is a secondary penalty factor, ω_kRepresenting the frequency center of the previous iteration.

(3) For all ω ≧ 0, update

Where τ represents noise tolerance, K represents the total number of modes, and K represents the kth mode.

(4) Judging whether an iteration termination condition is met:

if the termination condition is not met, repeating the steps (2) and (3), if the condition is met, iteratively terminating to obtain K decomposed subsequences.

Further, the calculating the sample entropy of each decomposition sequence, and combining the modal components with similar sample entropy values to form a new component specifically includes:

the complexity of each component is evaluated by using sample entropy, the lower the value of the sample entropy, the higher the similarity of the sequences, and the lower the complexity, and for a time sequence consisting of N points { x (N) } x (1), x (2), … x (N), the sample entropy is calculated as follows:

(1) forming a set of vector sequences of dimension m, X, from the sequence numbers_m(1),…,X_m(N-m +1) wherein X_m(i) (ii) { x (i), x (i +1), …, x (i + m-1) }, (1 ≦ i ≦ N-m + 1); x (i), x (i + m-1) respectively represents the ith and (i + m-1) th points in the original sequence.

(2) Definition vector X_m(i) And X_m(j) Distance d [ X ] between_m(i),X_m(j)]The absolute value of the maximum difference value of the two corresponding elements;

d[X_m(i),X_m(j)]＝max_k＝0,…m-1(|x(i+k)-x(j+k)|) (7)

(3) given a threshold r, record d [ X ]_m(i),X_m(j)]<The number of j of r is denoted B_iFor i is not less than 1 and not more than N-m, B_iThe ratio to N-m +1 is recorded as:

(4) for all

Averaging to obtain:

(5) increase dimension to m +1Calculating X_m+1(i) And X_m+1(j) The number of distances less than or equal to r is recorded as A_i，

Is defined as:

(6) definition A^m(r) is:

B^m(r) and A^m(r) is the matching probability of the sequence pair m and m +1 points, respectively, when the sample entropy is defined as:

when N takes a finite value, the estimate of the sample entropy is:

further, the normalizing the merged new component to map the load data between [0 and 1], specifically includes:

normalization: the method comprises the following steps of scaling original data in a specific interval, converting the original data into dimensionless pure values, mapping loads between [0 and 1] by adopting a min-max standardization method, and calculating the following formula:

wherein x is_minAnd x_maxThe minimum and maximum values of the sample data, respectively, and x' represents the mapped value.

Further, the time series input and output label pairs are constructed by utilizing a sliding window. For example, if the length of the sliding window is 7, the load value of the next Day is predicted by using the load data of the past seven days, and the pair of input and output tags is constructed by using Day1-Day7 as input and Day8 as an output tag; and pushing in by using Day2-Day8 as input and Day9 as an output tag until the whole data set is traversed. Finally, the constructed input and output label pair is used for training the model;

further, the constructing of the short-term power load prediction model of the time convolution network specifically includes:

a causal convolution module, an expansion convolution module and a residual connection module are integrated into the time convolution network TCN, wherein the causal convolution module is used for ensuring that time constraints carry out convolution operation according to a time sequence, namely a value of a time t only depends on values of a previous layer of network at the time t and before the time t; in addition, in order to learn longer time sequence dependence and avoid the loss of historical data information, the TCN uses an expansion convolution module, the size of a receptive field is increased under the condition of not pooling, and each convolution outputs information in a larger range; meanwhile, a residual error connection module is added in the TCN, namely, the input data directly skip the middle convolution operation, are subjected to 1 × 1 convolution processing to the dimension same as the output dimension and then are added with the output data, and the result is used as the final output data of the layer; the TCN adopts residual blocks for stacking, and one residual block consists of causal convolution, expansion convolution, weight normalization, an activation function and random inactivation; in the network training process, an Adam optimizer is adopted to adjust network weight parameters and search a network optimal value.

The invention has the following advantages and beneficial effects:

the invention combines the variation mode decomposition and the time convolution network, decomposes the original data into a subsequence with limited bandwidth by using the variation mode technology (VMD), and extracts the effective characteristics in the load sequence. In order to avoid modal aliasing of the decomposed signals in the decomposition process, the number of final decompositions is judged according to the center frequency of the decomposed sequence. Meanwhile, the sample entropy is used for measuring the complexity of each modal component, and the modal components with similar sample entropy values are combined into a new component, so that the number of models to be constructed in the training process can be reduced. In addition, the combined new characteristic sequence represents different characteristic information, corresponding prediction model parameters can be set according to the corresponding sequence characteristics, and the training speed and the prediction capability of the model can be effectively improved. And finally, using the time convolution network as a prediction model. Compared with the common convolutional neural network, the time convolutional network is not limited to the size of a convolution kernel any more, and long-term dependence information can be well captured. The causal convolution and the expansion convolution are fused in the structure of the time convolution network, the size of a receptive field can be increased under the condition of not pooling, longer time sequence dependent information can be learned, and the loss of historical data information is avoided. Meanwhile, compared with the common cyclic neural network, the time convolution network can process the time sequence data in parallel without step processing like the cyclic neural network, and the data processing speed can be increased. In addition, the problems of gradient disappearance and gradient explosion can not occur in the training process. Therefore, the variational modal decomposition and the time convolution network are combined to be applied to power load prediction, the load original sequence can be decomposed, the effective characteristics of the load sequence can be extracted, and the structural advantages of the time convolution network can be utilized to improve the training speed and the prediction accuracy of the model.

Drawings

FIG. 1 is a flow chart of the present invention providing a preferred embodiment for load prediction in conjunction with variational modal decomposition and time convolutional networking.

Fig. 2 is a schematic diagram of a time convolution network structure.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

as shown in fig. 1, the present implementation provides a short-term power load prediction method combining variational modal decomposition and a time convolution network, comprising the steps of:

(1) load data preprocessing can contain some abnormal bad data in the data acquisition process, and final prediction precision can be influenced if the load data are not processed. For missing or abnormal data, a horizontal processing method is adopted, i.e., an average value is calculated from adjacent data, and the calculated average value is used instead of the missing value.

(2) Decomposing each modal component of the power load data by using a variational mode, determining the number of decomposed eigenfunctions according to an experimental result in order to prevent a modal aliasing phenomenon, wherein the specific flow of the variational mode decomposition is as follows:

1. initialization

N is set to 0.

2. For each subsequence, the updating is continuously carried out according to (3) and (4)

And

in the formula:

for the wiener filtering of the current residual component,

is the frequency center of the modal component corresponding thereto, and ω is the frequency value.

3. For all ω ≧ 0, update

4. Judging whether an iteration termination condition is met:

(3) And calculating the sample entropy of each decomposition sequence, and combining the modal components with similar sample entropy values to form a new component. In general, for a time series of N points { x (N) } ═ x (1), x (2), … x (N), the sample entropy is calculated as follows:

1. forming a set of vector sequences of dimension m, X, from the sequence numbers_m(1),…,X_m(N-m +1) wherein X_m(i)＝{x(i),x(i+1),…,x(i+m-1)},(1≤i≤N-m+1)。

2. Definition vector X_m(i) And X_m(j) S [ X ] between_m(i),X_m(j)]Is the absolute value of the maximum difference between the two corresponding elements.

d[X_m(i),X_m(j)]＝max_k＝0,…m-1(|x(i+k)-x(j+k)|) (5)

3. Given a threshold r, record d [ X ]_m(i),X_m(j)]<The number of j of r is denoted B_i. For i is more than or equal to 1 and less than or equal to N-m, B_iThe ratio to N-m +1 is recorded as:

4. for all

Averaging to obtain:

5. add dimension to m +1, calculate x_m+1(i) And X_m+1(j) The number of distances less than or equal to r is recorded as A_i。

Is defined as:

6. definition A^m(r) is:

when N takes a finite value, the estimate of the sample entropy is:

(4) and carrying out normalization processing on the combined new components, and mapping the load data between [0 and 1 ].

Normalization: in order to improve the convergence rate of the model and the reliability of the result, the input data of the model needs to be normalized. Namely, the original data is scaled in a specific interval and converted into a dimensionless pure numerical value. The load is mapped between [0,1] using the min-max normalization method. The calculation formula is as follows:

(5) And constructing a short-term power load prediction model of the time convolution network, wherein causal convolution, expansion convolution and residual connection are integrated in the time convolution network. Causal convolution ensures that strict time constraints perform convolution operation according to time sequence, namely the value of the time t only depends on the values of the previous layer of network at the time t and before the time t. In addition, in order to learn longer time sequence dependence and avoid the loss of historical data information, the TCN uses expansion convolution, and the size of a receptive field can be increased under the condition of not pooling, so that each convolution outputs information in a larger range. Meanwhile, in order to increase the stability of the network and prevent the network from being degraded due to the increase of the number of network layers, the TCN is added with residual error connection, namely, input data directly skip the middle convolution operation, the input data are added with output data after being processed to the dimension same as the output dimension through 1 × 1 convolution, and the result is used as the final output data of the layer. As shown in fig. 2, the TCN is stacked with a residual block, one consisting of causal convolution, dilation convolution, weight normalization, activation function, random deactivation. In the network training process, an Adam optimizer is adopted to adjust network weight parameters and search a network optimal value.

(6) After the model training is finished, test data are put into the model, and the learned model is used for predicting the power load data.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A method for short term power load prediction in conjunction with VMD decomposition and time convolution networks, comprising the steps of:

2. The short-term power load prediction method combining VMD decomposition and time convolution network as claimed in claim 1, wherein said obtaining the decomposition sequence by using variational mode VMD decomposition to the load sequence comprises the following steps:

the realization of the variational modal decomposition can be divided into two steps, namely constructing a variational problem and solving the variational problem; in the structural variation problem, firstly, a single-side frequency spectrum of each sub-signal is obtained based on Hilbert transform, and then, a central frequency omega corresponding to each sub-signal_kThe exponential term aliasing of the sub-sequence modulates the frequency spectrum of the sub-sequence to a base band, and finally, the bandwidth of the demodulated signal is estimated by Gaussian smoothing, and finally, the frequency spectrum of the sub-sequence is converted into a frequency spectrum with a frequency equal to a frequency spectrum with a frequencyAnd converting into a variational problem with constraint solving.

3. The short-term power load prediction method in combination with VMD decomposition and time convolution network of claim 2 wherein the complete flow of the variational modal decomposition is as follows:

(1) initialization

Wherein

Respectively representing the k-th modal component and the center frequency,

for lagrange operators, the upper left digit 1 represents the first iteration;

And

in the formula:

for the wiener filtering of the current residual component,

respectively represent the original sequences f (t), and

fourier transform of (a) is a secondary penalty factor, ω_kRepresenting the frequency center of the previous iteration;

(3) for all ω ≧ 0, update

Wherein tau represents noise tolerance, K represents the total number of modes, and K represents the kth mode;

(4) judging whether an iteration termination condition is met:

4. The short-term power load prediction method combining VMD decomposition and time convolution network according to claim 3, wherein the calculating sample entropies of each decomposition sequence, and combining modal components with similar sample entropy values to form a new component specifically comprises:

the complexity of each component is evaluated by using sample entropy, the lower the value of the sample entropy, the higher the similarity of the sequences, and the lower the complexity, and for a time sequence consisting of N points { x (N) } x (1), x (2),. x (N), the sample entropy is calculated as follows:

(1) forming a set of vector sequences of dimension m, X, from the sequence numbers_m(1)，...，X_m(N-m +1) wherein X_m(i) (ii) { x (i), x (i +1), …, x (i + m-1) }, (1 ≦ i ≦ N-m + 1); x (i), wherein x (i + m-1) respectively represents the ith and (i + m-1) th points in the original sequence;

(2) definition vector X_m(i) And X_m(j) Distance d [ X ] between_m(i)，X_m(j)]The absolute value of the maximum difference value of the two corresponding elements;

d[X_m(i)，X_m(j)]＝max_{k＝0，...m-1}(|x(i+k)-x(j+k)|) (7)

(3) given a threshold r, record d [ X ]_m(i)，X_m(j)]The number of j < r is denoted B_iFor i is not less than 1 and not more than N-m, B_iThe ratio to N-m +1 is recorded as:

(4) for all

Averaging to obtain:

(5) add dimension to m +1, calculate X_m+1(i) And X_m+1(j) The number of distances less than or equal to r is recorded as A_i，

Is defined as:

(6) definition A^m(r) is:

when N takes a finite value, the estimate of the sample entropy is:

5. the method for predicting short-term power load by combining VMD decomposition and time convolution network as claimed in claim 4, wherein said normalizing the merged new components to map load data between [0,1], specifically comprises:

6. The short-term power load prediction method combining VMD decomposition and time convolution network of claim 5 wherein, the time-series I/O tag pair is constructed by using a sliding window, if the length of the sliding window is 7, the load data of past seven days is used for predicting the load value of the next Day, the I/O tag pair is constructed by using Day1-Day7 as input and Day8 as output tags; and using Day2-Day8 as input, using Day9 as an output label, and interpolating until the whole data set is traversed, and finally using the constructed input-output label pair for training the model.

7. The method for predicting short-term power load by combining VMD decomposition and time convolution network as claimed in claim 6, wherein said constructing a time convolution network short-term power load prediction model specifically comprises: