CN112232593A

CN112232593A - Power load prediction method based on phase space reconstruction and data driving

Info

Publication number: CN112232593A
Application number: CN202011217056.4A
Authority: CN
Inventors: 侯慧; 王晴; 吴细秀; 张清勇; 王建建; 唐金锐
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2021-01-15

Abstract

The invention discloses a power load prediction method based on phase space reconstruction and data driving. Secondly, comparing the prediction accuracy of twelve data driving models under the same data set to obtain XGboost which is the optimal model in the statistical learning model, and LSTM and the extreme learning machine which are the optimal models in the neural network model, and respectively predicting the daily, weekly, semi-monthly and monthly loads of the three models, wherein higher prediction accuracy can be kept. And finally, performing unequal weight combination on the XGboost, the LSTM and the extreme learning machine by using a grey correlation method to construct a combined data driving model. The invention can improve the prediction precision.

Description

Power load prediction method based on phase space reconstruction and data driving

Technical Field

The invention relates to the technical field of power system load prediction, in particular to a power load prediction method based on phase space reconstruction and data driving.

Background

The accurate load prediction can keep the safety and stability of the operation of the power grid, reduce unnecessary rotary reserve capacity, reasonably arrange a unit maintenance plan, effectively reduce the power generation cost and improve economic and social benefits. The magnitude of the electrical load is affected by a number of factors: weather, temperature, holidays, regional GDP, etc. However, it is often difficult to obtain accurate data for all influencing factors. Therefore, if a more accurate power load prediction method can be implemented in the absence of multi-factor data, the cumbersome process of collecting and processing various data can be eliminated, and the power load prediction step can be simplified.

The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:

at present, load prediction under the condition of considering multi-factor data is researched more at home and abroad, but accurate multi-factor data is difficult to obtain under general conditions, and a large amount of time is consumed for processing the multi-factor data, so that research based on single load historical data is particularly important. In the existing load prediction research based on phase space reconstruction, a BP neural network or a BP neural network variant is generally adopted for prediction, and the prediction precision is low.

Disclosure of Invention

The invention provides a power load prediction method based on phase space reconstruction and data driving, which is used for solving or at least partially solving the technical problem of low prediction precision of the method in the prior art.

In order to solve the above technical problem, the present invention provides a power load prediction method based on phase space reconstruction and data driving, including:

s1: calculating delay time and embedding dimension of historical load data, dividing the historical load data into a training set and a testing set, respectively carrying out phase space reconstruction on the training set and the testing set according to the delay time and the embedding dimension, and carrying out maximum and minimum normalization processing;

s2: inputting the training set and the test set subjected to the maximum and minimum normalization processing into at least two types of data driving models, wherein each type of data driving model comprises one or more data driving models, the data driving models are used for autonomously learning the data characteristics of the input data set and outputting a prediction result, the prediction precision of all the data driving models is compared, and the optimal models corresponding to different types of data driving models are screened out by adopting preset evaluation indexes;

s3: and establishing a combined model of optimal models corresponding to different types of data driving models under the condition of phase space reconstruction by adopting a grey correlation method, and predicting the power load by utilizing the established combined model.

In one embodiment, step S1 includes:

s1.1: calculating the delay time of the historical load data by adopting an autocorrelation method, wherein the chaos time sequence is x₁,x₂,…,x_N-1,x_NFor N data points, the autocorrelation function with a time span of τ is

Obtaining an image related to the delay time tau according to the autocorrelation function, wherein when the value of the autocorrelation function is reduced to 1-1/e times of the initial value, the corresponding tau is the delay time of the reconstruction phase space;

s1.2: calculating the embedding dimension of the historical load data by adopting a pseudo-neighbor method, and calculating the ratio S of pseudo-nearest neighbors for the chaos time sequence data from the embedding dimension m being 2_mOr the number of the pseudo-nearest neighbors, increasing the embedding dimension, and recalculating the ratio S of the pseudo-nearest neighbors_mOr the number of pseudo-nearest neighbors, when the ratio S of the pseudo-nearest neighbors_mWhen the number of the pseudo-nearest neighbors is smaller than the preset proportion or the number of the pseudo-nearest neighbors is not reduced any more, the embedding dimension at the moment is used as the embedding dimension of the historical load dataWherein

if S_m> S, then X (t) and X_j(t) is the pseudo-nearest neighbor, where S is the threshold, R_m(t) is X (t) and X_j(t) distance in space of embedding dimension m, R_m+1(t) is X (t) and X_j(t) distance in space with embedding dimension m + 1;

s1.3: respectively carrying out phase space reconstruction on the training set and the test set according to the delay time and the embedding dimension, specifically: the chaotic time sequence is x₁,x₂,…,x_N-1,x_NTo x_t(t ═ 1,2, …, N- (m-1) τ), transformed as follows:

x_t＝(x_t,x_(t+τ),x_(t+2τ),…,x_[t+(m-1)τ])^T

wherein τ is the delay time and m is the embedding dimension;

according to a phase space reconstruction method, a chaos time sequence x is divided into₁,x₂,…,x_N-1,x_NConversion into a new data space of delay τ and dimension m, i.e.

Wherein each column represents a vector or phase point;

s1.4: carrying out normalization processing on the load data by using a maximum and minimum normalization algorithm:

wherein x is_minIs the minimum value, x, in the load data_maxIs the maximum value, x, in the load data_iIs the ith value in the load data.

In one embodiment, the different types of data models in step S2 include three types, namely, a reference model, a neural network model and a statistical learning model, wherein the data-driven model autonomously learns the data features by: the reference model takes the load value at the t-1 moment as a load predicted value at the t moment, the load predicted value is directly output without training, the neural network model continuously feeds back and adjusts the network weight according to the test set data, the optimal weight of the network is sought by using a gradient descent algorithm, and finally the prediction result of the test set is output; the statistical learning model obtains residual errors according to respective objective functions, continuously optimizes the hyper-parameters by using a gradient descent algorithm, and finally outputs a prediction result of the test set.

In one embodiment, the data-driven models employed include 12, of which 1 reference model, 5 neural network models, and 6 statistical learning models,

the benchmark model makes the persistence algorithm: the Persistence algorithm takes the load value at the time t-1 as the load predicted value at the time t;

the neural network model comprises a BP neural network, an Elman neural network, a wavelet neural network, an extreme learning machine and a long-short term memory network;

statistical learning models include linear regression, Lasso regression, Ridge regression, ElasticNet regression, support vector regression, and extreme gradient boosting.

In one embodiment, two evaluation indexes, namely a variation coefficient of a root mean square error and a symmetric average absolute percentage error, are used as preset evaluation indexes, an extreme learning machine and a long-short term memory network are obtained according to the preset evaluation indexes and are used as optimal models in a neural network model, and an extreme gradient is promoted to be the optimal models in a statistical learning model.

In one embodiment, the coefficient of variation of root mean square error CVRMSE and the symmetric mean absolute percentage error SMAPE are formulated as follows:

wherein, y_iAnd

the actual value and the predicted value of the ith sample are respectively, and N is the total number of samples.

In one embodiment, step S3 includes:

s3.1: obtaining the predicted value x of a time sequence with a certain length of N under M prediction models_mnWherein, M is 1,2,3,., M, N is 1,2, 3., N, and the combination model is:

wherein l_mIs the weight of the m model and satisfies

Solving the weight value by adopting a grey correlation method, which is equivalent to solving

Wherein e is_mnThe prediction error of the mth prediction model at the n moment;

s3.2: and performing daily load prediction and weekly load prediction on the power load by using the established combined model.

One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:

the method provided by the invention comprises the steps of firstly calculating the delay time and the embedding dimension of historical load data, dividing a data set into a training set and a testing set, respectively carrying out phase space reconstruction on the training set and the testing set according to the delay time and the embedding dimension, and then carrying out normalization processing; and comparing the prediction accuracy of all the data driving models under the same data set to obtain optimal models corresponding to different types of data driving models, and finally, carrying out unequal weight combination on the optimal models corresponding to the different types of data driving models by using a grey correlation method to construct a combined model, so that the power load can be predicted by using the established combined model. Because the phase space reconstruction technology is adopted to process the historical load data, the complicated step of collecting and processing multi-factor data can be saved; load prediction is carried out by applying various data driving models, and compared with the traditional data driving model, the load prediction method has stronger nonlinear capacity to carry out prediction; the grey correlation method is adopted for combination and integration of deep learning and statistical learning, so that a better effect than that of a single learning device is obtained, and the accuracy of power load prediction is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a basic flow diagram of load prediction in an embodiment of the present invention;

FIG. 2 is a diagram illustrating a result of calculating delay time by autocorrelation method for monthly history data according to an embodiment of the present invention;

FIG. 3 is a graph of the result of delay time calculation using autocorrelation method on almanac history data in an embodiment of the present invention;

FIG. 4 is a result diagram of embedding dimension calculation using a pseudo-neighbor method on monthly history data according to an embodiment of the present invention;

FIG. 5 is a result diagram of embedding dimensions for almanac history data using a pseudo-neighbor method in an embodiment of the present invention;

FIG. 6 is an overview of selected data-driven models in accordance with an embodiment of the present invention;

FIG. 7 is a comparison chart of CVRMSE for different prediction time periods in an embodiment of the present invention;

FIG. 8 is a diagram illustrating SMAPE comparison for different prediction time periods in an embodiment of the present invention;

FIG. 9 is a graph of the combined model daily load prediction results in an embodiment of the present invention;

FIG. 10 is a graph of the combined model weekly load prediction results in an embodiment of the present invention.

Detailed Description

Through a great deal of research and practice, the inventor of the application finds that in the existing load prediction research based on phase space reconstruction, generally, a BP neural network or a variant of the BP neural network is adopted for prediction, the prediction precision is low, and the BP neural network and the variant of the BP neural network are rarely combined with the phase space reconstruction in the deep learning and machine learning algorithms which have high prediction precision and are popular in recent years. Therefore, the power load prediction based on the phase space reconstruction and the data driving algorithm has important practical significance in simplifying the load prediction steps and improving the load prediction accuracy.

Based on the above, the invention provides a power load prediction method based on phase space reconstruction and data driving, which is used for solving the technical problem of low prediction accuracy in the method in the prior art.

The main inventive concept of the present invention is as follows:

a power load prediction method based on phase space reconstruction and a combined data driving model is provided. Firstly, calculating delay time and embedding dimension of historical load data, dividing a data set into a training set and a testing set, respectively carrying out phase space reconstruction on the training set and the testing set according to the delay time and the embedding dimension, and carrying out normalization processing. Secondly, comparing the prediction accuracy of different data driving models under the same data set to obtain optimal models corresponding to different types of data driving models, and finally, carrying out unequal weight combination on the optimal models corresponding to different types of data driving models by using a grey correlation method to construct a combined model, so that the power load can be predicted by using the established combined model.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

Referring to fig. 1 to 10, an embodiment of the present invention provides a power load prediction method based on phase space reconstruction and data driving, including:

Specifically, the types of the data-driven models can be selected according to actual situations, such as 2 types, 3 types, 5 types, and the like. Each type of data-driven model may contain one or more.

In one embodiment, step S1 includes:

s1.1: calculating the delay time of the historical load data by adopting an autocorrelation method, wherein the chaos time sequence is x₁,x₂,…,x_N-1,x_NTo aN data points, an autocorrelation function of time span τ of

s1.2: calculating the embedding dimension of the historical load data by adopting a pseudo-neighbor method, and calculating the ratio S of pseudo-nearest neighbors for the chaos time sequence data from the embedding dimension m being 2_mOr the number of the pseudo-nearest neighbors, increasing the embedding dimension, and recalculating the ratio S of the pseudo-nearest neighbors_mOr the number of pseudo-nearest neighbors, when the ratio S of the pseudo-nearest neighbors_mWhen the number of the pseudo-nearest neighbors is less than the preset proportion or the number of the pseudo-nearest neighbors is not reduced any more, the embedding dimension at the moment is used as the embedding dimension of the historical load data, wherein,

x_t＝(x_t,x_(t+τ),x_(t+2τ),…,x_[t+(m-1)τ])^T

wherein τ is the delay time and m is the embedding dimension;

according to the phase space reconstruction method, the chaos is correctedTime series x₁,x₂,…,x_N-1,x_NConversion into a new data space of delay τ and dimension m, i.e.

Wherein each column represents a vector or phase point;

Specifically, in step S1.1, for a time series, the autocorrelation method is to make an image of a function value with respect to a delay time τ according to an autocorrelation function, when the value of the autocorrelation function decreases to 1-1/e times of an initial value, the corresponding time is the delay time τ of a reconstructed phase space, the image of the autocorrelation function with respect to the delay time τ can be obtained by the autocorrelation function in step S1.1, and when the value of the autocorrelation function decreases to 1-1/e times of the initial value, the corresponding τ is the delay time of the reconstructed phase space.

Fig. 2 and 3 are graphs showing the results of the delay time calculation using the autocorrelation method for the monthly history data and the almanac history data, respectively, and it can be seen from fig. 2 and 3 that when the value of the autocorrelation function falls to 1-1/e times of the initial value, i.e., the autocorrelation function value is 0.6321, the delay time for the monthly history data and the almanac history data is 4.

In step S1.2, the embedding dimension is calculated using a pseudo-neighbor method. For phase point X (t) in space (x)_t,x_(t+τ),x_(t+2τ),…,x_[t+(m-1)τ]) There is a nearest neighbor. In m-dimensional space, let R_m(t)＝||X(t)-X_j(t) |, X (t) and X_j(t) is twoDifferent phase points, R_m(t) is X (t) and X_j(t) distance in m-dimensional space. When the dimension number is increased to m +1, the distance between the two points changes, and the changed distance is

If R is_m+1Ratio R_mMuch larger, this is considered to be due to two points that are not originally adjacent in the high-dimensional phase space being projected into the low-dimensional space to become adjacent points. Thus, the ratio S of the pseudo-nearest neighbors is obtained_mIf S is_m> S, then X (t) and X_j(t) is the pseudo-nearest neighbor, where S is the threshold.

In the specific implementation process, the ratio S of the pseudo-nearest neighbors is calculated for the chaos time sequence data from the embedding dimension m being 2_mOr the number of pseudo-nearest neighbors. Increasing the embedding dimension, recalculating until the ratio S of the pseudo-nearest neighbors_mLess than 5% or the number of the pseudo-nearest neighbors is not reduced any more, and the embedding dimension at this time can be considered to enable the chaotic motion trail to be completely expanded, namely the most suitable embedding dimension, and the embedding dimension is used as the embedding dimension of the historical load data.

Fig. 4 and 5 are graphs of the result of finding the embedding dimension by the pseudo-neighbor method for the monthly history data and the almanac history data, respectively, and it can be known from fig. 4 and 5 that for the monthly history data, the pseudo-neighbor rate decreases to 0% when the embedding dimension increases to 4, and for the yearly history data, the pseudo-neighbor rate decreases to 0% when the embedding dimension increases to 5. Therefore, the embedding dimension of the monthly calendar history load data is 4, and the embedding dimension of the monthly calendar history load data is 5.

In step S1.3, a method for reconstructing the phase space of the chaotic time series is provided, i.e. a new data space with a time delay of τ and a dimension of m is provided. The phase space reconstruction process can be implemented by Matlab.

the benchmark model makes the persistence algorithm: the Persistence algorithm takes the load value at the moment as the load predicted value at the moment;

Specifically, the 12 data-driven models include 1 reference model, 5 neural network models, and 6 statistical learning models. The reference model, the LSTM and the statistical learning model are realized by using Python language, and the neural network model except the LSTM is realized by using Matlab.

In the specific implementation process, two evaluation indexes, namely the variation coefficient of the root mean square error and the symmetric average absolute percentage error, are used as preset evaluation indexes, so that the extreme learning machine and the long-short term memory network can be obtained as the optimal models in the neural network model, and the extreme gradient is improved as the optimal models in the statistical learning model. The three models are respectively subjected to daily, weekly, semi-monthly and monthly load prediction, and higher prediction precision can be kept. FIG. 6 is an overview of the selected data-driven model.

The reference model uses the Persistence algorithm. The Persistence algorithm takes the load value at the time t-1 as the predicted load value at the time t.

The 5 neural network models comprise a BP neural network, an Elman neural network, a wavelet neural network, an extreme learning machine and a long-short term memory network.

(1) And (4) a BP neural network. The basic idea of the BP Neural Network (BPNN) is a multi-layer feedforward Network trained according to error Back Propagation, and the BPNN is a gradient descent method, and utilizes a gradient search technology to minimize the mean square error between an actual output value and an expected output value of the Network, and comprises an input layer, a hidden layer and an output layer. For the regression problem, the number of neurons in the output layer is typically 1. The hyper-parameters are set as: the maximum training time is net.trainparam.epochs is 500, the training requirement precision is net.trainparam.goal is 0.01, and the learning rate is net.trainparam.lr is 0.01.

(2) Elman neural network. An Elman Neural Network (ENN) is additionally provided with a Context Layer (Context Layer) on the basis of a BP Neural Network, the Context Layer receives a feedback signal from the hidden Layer and is used for memorizing an output value of a neuron of the hidden Layer at the previous moment, and the output of the neuron of the hidden Layer is delayed, stored and input to the hidden Layer. This makes it sensitive to historical data, increasing the ability of the network itself to process dynamic information. The hyper-parameters are set as: the maximum training time is net.trainparam.epochs is 500, the training requirement precision is net.trainparam.goal is 0.01, and the learning rate is net.trainparam.lr is 0.01.

(3) A wavelet neural network. Wavelet Neural Network (WNN) is a Neural Network based on BP Neural Network topology, taking Wavelet basis function as hidden layer neuron activation function. The hyper-parameters are set as: the number of iterations maxgen is 50 and the learning rate net. Wavelet basis function of

Where x is the input value for the hidden layer neuron.

(4) An extreme learning machine. An Extreme Learning Machine (ELM) is used for solving a single hidden layer neural network algorithm. When the activation function is infinitely differentiable, the parameters of the single hidden layer feedforward neural network do not need to be adjusted completely, and the weights and the bias between the input layer and the hidden layer can be randomly selected before training. The connection weight β between the hidden layer and the output layer is

β＝H⁺D′

Wherein H⁺D' is the transpose of the ideal output of the network, which is the generalized inverse of the hidden layer output matrix.

(5) Long and short term memory networks. Long Short Term Memory (LSTM) is a special recurrent neural network. The basic constituent unit of the LSTM network mainly comprises a forgetting gate, an input gate, an output gate and a memory unit. The hyper-parameters are set as: the number units of LSTM units is 1, the average absolute error is used as a loss function, i.e. loss is 'mae', the random gradient descent method is adopted for optimization, i.e. optizer is 'sgd', the training algebra epochs is 50, the current loss function, i.e. base _ size is 1, is printed every other sample, and each iteration outputs one line of record, i.e. verbose is 2.

The 6 statistical learning models were linear regression, Lasso regression, Ridge regression, Elastic Net regression, support vector regression, and extreme gradient boosting.

(1) And (6) linear regression. Linear Regression (LR) assumes a Linear correlation between the target value and the feature, i.e., satisfies a multivariate Linear equation for each sample point

y_i＝θ^Tx_i+ε_i

Where i denotes the ith sample point, θ^TIs a coefficient vector of_iThe regression error for the ith sample point. x is the number of_iAnd y_iCharacteristic values and target values, respectively.

The error approximation follows a normal distribution:

where σ is the error ε_iVariance of (d), default error ε_iIs desirably 0.

The log-likelihood function is

Wherein m is the number of samples.

When the sample is determined, the only variable in the formula is θ^TThus, the maximum likelihood function maxl (θ) can be converted to

Will theta^Tx_iWritten as x_iAs a function h of the argument_θ(x_i) Can be further converted into

Thus, the objective function of linear regression is

(2) Lasso regression. Adding a constraint on the magnitude of the coefficient vector θ to the objective function may prevent overfitting. The addition of the L1-norm term is Lasso regression, and the objective function of the Lasso regression is

Where λ is the coefficient of the constraint term and n is the length of the coefficient vector θ. λ is set to 0.5.

(3) Ridge regression. Adding an L2-norm regular term to be Ridge regression, wherein the objective function of the Ridge regression is

λ is set to 0.5.

(4) Elastic Net regression. Adding an L1-norm regular term and an L2-norm regular term at the same time, namely the Elastic Net regression, wherein the objective function of the Elastic Net regression is

Wherein rho is the weight of the L1-norm regular term, and 1-rho is the weight of the L2-norm regular term. λ ρ 0.5 is set.

(5) Vector regression is supported. Support Vector Regression (SVR) is a form of Regression generalization for Support Vector Machines (SVMs). SVR solves the regression prediction problem by constructing a hyperplane. And when the distances between all the sample points and the hyperplane are minimum, the hyperplane is the optimal hyperplane. The formula and distance of the hyperplane can be expressed as

f(x)＝w^Tx+b

d＝y_i-f(x)

The SVR model can be written as

Where w is the weight vector, b is the offset, y_iIs a sample target value, x is a sample feature vector, ξ_iAnd

for the relaxation variables introduced to the upper and lower boundaries, C is the penalty coefficient of the objective function, and ε is the maximum distance between the sample data and the hyperplane, i.e. d_max. The hyper-parameters are set as: a linear kernel function, i.e. kernel, is adopted, and the penalty coefficient C of the objective function is 1.25.

(6) And (4) lifting an extreme gradient. A classification regression tree (CART) is used as a basis learner for XGBoost (XGBoost). Assume that the sample set is D { (x)_i,y_i) Where the number of samples is n, the sample dimension is m,x_idenotes the ith sample, y_iRepresenting the ith class label, the XGboost model is

Wherein,

to predict value, K is the number of trees, f_kIs a function in the function space F, which is the set of all possible CART.

Constructing a loss function with a regularization term of

First term in loss function

Calculating the difference between the predicted value and the actual value of each sample for loss error calculation and summing; the second term Σ Ω (f)_k) And calculating the regular term value of each sample for the regular term, and summing the regular term values for controlling the complexity of the model and preventing overfitting. For a single sample, its regularization term value is

Ω(f_k)＝γT+λ||w||²

T in the regular term represents the number of leaf nodes of each tree, w is a fractional vector of the leaf nodes of each tree, and gamma and lambda are respectively a leaf node number constraint coefficient and a leaf node fractional vector constraint coefficient.

The hyper-parameters are set as: the mean square error is used as the loss function, i.e., objective is 'reg: squarereror', and the number of trees n _ estimators is 1000.

The comparison result can be obtained, and in the neural network model, the ELM and the LSTM are optimal in the test set; in a statistical learning model, XGboost has the highest precision.

wherein, y_iAnd

Fig. 7 and 8 are comparison graphs of CVRMSE and SMAPE, respectively, for different prediction time lengths.

In one embodiment, step S3 includes:

s3.1: obtaining the predicted value x of a time sequence with a certain length of N under M prediction models_mnWherein, M is 1,2,3,., M, N is 1,2, 3., N, and the combination model is

Wherein l_mIs the weight of the m model and satisfies

The solution result is: in daily load prediction, the weight values of the extreme learning machine, the extreme gradient boosting network and the long-short term memory network are 0.171, 0.544 and 0.284 respectively; in the weekly load prediction, the weights of the extreme learning machine, the extreme gradient boost and the long-short term memory network are 0.641, 0.275 and 0.110 respectively. And respectively carrying out daily load prediction and weekly load prediction on the combined model. Fig. 9 and 10 are a daily load prediction result diagram and a weekly load prediction result diagram, respectively, of the combination model.

Compared with the prior art, the method has the following positive effects and advantages:

(1) the phase space reconstruction technology is adopted to process historical load data, so that the tedious steps of collecting and processing multi-factor data can be reduced;

(2) the load prediction is carried out by applying a plurality of data-driven models, wherein the data-driven models comprise deep learning models and statistical learning models, and compared with the traditional data-driven models, the load prediction method has stronger nonlinear capability to carry out prediction;

(3) the combination of the grey correlation method integrates deep learning and statistical learning, so that a better effect than that of a single learner is obtained.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A power load prediction method based on phase space reconstruction and data driving is characterized by comprising the following steps:

2. The power load prediction method according to claim 1, wherein step S1 includes:

s1.2: calculating the embedding dimension of the historical load data by adopting a pseudo-neighbor methodChaos time sequence data, starting from embedding dimension m being 2, calculating ratio S of pseudo nearest neighbor_mOr the number of the pseudo-nearest neighbors, increasing the embedding dimension, and recalculating the ratio S of the pseudo-nearest neighbors_mOr the number of pseudo-nearest neighbors, when the ratio S of the pseudo-nearest neighbors_mWhen the number of the pseudo-nearest neighbors is less than the preset proportion or the number of the pseudo-nearest neighbors is not reduced any more, the embedding dimension at the moment is used as the embedding dimension of the historical load data, wherein,

x_t＝(x_t,x_(t+τ),x_(t+2τ),…,x_[t+(m-1)τ])^T

wherein τ is the delay time and m is the embedding dimension;

Wherein each column represents a vector or phase point;

3. The power load prediction method of claim 1, wherein the different types of data models in step S2 include three types, namely a reference model, a neural network model and a statistical learning model, and wherein the data-driven model autonomously learns the data characteristics by: the reference model takes the load value at the t-1 moment as a load predicted value at the t moment, the load predicted value is directly output without training, the neural network model continuously feeds back and adjusts the network weight according to the test set data, the optimal weight of the network is sought by using a gradient descent algorithm, and finally the prediction result of the test set is output; the statistical learning model obtains residual errors according to respective objective functions, continuously optimizes the hyper-parameters by using a gradient descent algorithm, and finally outputs a prediction result of the test set.

4. The power load prediction method of claim 3 wherein the data-driven models employed include 12, of which 1 reference model, 5 neural network models and 6 statistical learning models,

statistical learning models include linear regression, Lasso regression, Ridge regression, Elastic Net regression, support vector regression, and extreme gradient boosting.

5. The power load prediction method according to claim 4, wherein two evaluation indexes, namely a variation coefficient of a root mean square error and a symmetric average absolute percentage error, are used as preset evaluation indexes, an extreme learning machine and a long-short term memory network are obtained according to the preset evaluation indexes and are used as optimal models in the neural network model, and an extreme gradient is improved to be the optimal models in the statistical learning model.

6. The method of claim 5, wherein the coefficient of variation of root mean square error CVRMSE and the symmetric mean absolute percentage error SMAPE are formulated as follows:

wherein, y_iAnd

7. The power load prediction method according to claim 1, wherein step S3 includes:

wherein l_mIs the weight of the m model and satisfies