CN109754113B

CN109754113B - Load prediction method based on dynamic time warping and long-and-short time memory

Info

Publication number: CN109754113B
Application number: CN201811439799.9A
Authority: CN
Inventors: 王堃; 王振宇; 孙雁飞; 亓晋; 岳东
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2022-08-30
Anticipated expiration: 2038-11-29
Also published as: CN109754113A

Abstract

The invention discloses a load prediction method based on dynamic time warping and long-and-short time memory, which comprises the following steps: s1, acquiring basic data required by user short-term load prediction from the power system: s2, clustering users with similar electricity consumption behaviors by using a dynamic time warping method according to historical load data of the users; s3, pooling the user data of the same category; s4, selecting training data, preprocessing the training data and using the preprocessed training data as input; s5, constructing a short-term load prediction method based on the deep long-term and short-term memory cyclic neural network, and verifying effectiveness. According to the method and the device, the users with similar electricity utilization behaviors are clustered aiming at the characteristic that the base number of the users to be predicted is large, and the prediction efficiency is improved. Meanwhile, the data in the same category are subjected to pooling processing, so that the diversity of training data is increased, the short-term load prediction precision is improved, and the method has certain engineering application significance.

Description

Load prediction method based on dynamic time warping and long-and-short time memory

Technical Field

The invention relates to a method for predicting short-term load of residents in an electric power system, in particular to a method for predicting short-term load of residents in the electric power system, which is used for predicting load of the electric power system and belongs to the technical field of pattern recognition and image processing.

Background

The load prediction of residents in the power system is based on historical load change rules and combined with factors such as weather and economy to scientifically predict loads in the next days or hours. Accurate load prediction is an important decision basis for scheduling power production and equipment maintenance plans. Therefore, it is necessary to research new methods and techniques for resident load prediction to improve the accuracy and reliability of load prediction and meet the engineering requirements.

In recent years, with the continuous development of deep learning technology, some deep learning models are gradually applied to the research of time series data. Among them, the Recurrent Neural Network (RNN) is a neural network with a self-recurrent structure, allows persistence of information flow of time-series data on a network layer, and is ideally suited to processing time-series data. As research progresses, RNNs have produced numerous variants, such as bidirectional recurrent neural networks, long-short term memory recurrent neural networks (LSTM), gated recurrent neural networks (GRU), and the like. Among numerous RNN variants, the LSTM network effectively solves the problems of gradient disappearance, gradient explosion, insufficient long-term memory and the like of the RNN, so that the recurrent neural network can really and effectively utilize long-distance time sequence information. Based on the technical foundation, how to provide a short-term household power load prediction method based on the LSTM for solving the problems of single dimension and strong data randomness of short-term household power demand data and ensuring that the short-term household power load prediction method has good prediction performance is the key point of research of people in the industry at present.

In addition, in order to further improve the accuracy of the existing short-time household power load prediction method, a combination prediction model is proposed by many scholars. The most typical way is to perform clustering according to different user load curves. The load profile is a 24 hour record of home energy consumption, and while consumer behavior indicates a device usage pattern, the exact time of device usage may vary, resulting in a load profile of a different shape. Thus, the daily 24 hour load curve for a household may show a large variation, whether on different days of the week or on a given day of the week. In order to properly classify households by behavior, the classification scheme must take these changes into account in time and focus on the basic structure of the device usage, which can also be considered as the basis for energy usage by the household.

For this situation, a method of comparing and classifying the home load curves by using Dynamic Time Warping (DTW) has also been developed. Under this metric, the time axis is stretched or shrunk to find the best match between two load curves and to find a canonical "shape" to represent a set of load curves. Under this measurement, two load curves classified as similar do not necessarily have similar values at each time point, but have similar shapes under the DTW measurement.

In summary, how to provide a novel method for predicting the short-term load of residents in an electrical power system based on the prior art, which takes the advantages of the prior art as a basis and gives full play to the advantages of the prior art, improves the accuracy of prediction and reduces the complexity of an algorithm, is a problem to be solved by the technical personnel in the field.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, the present invention provides a load prediction method based on dynamic time warping and long-and-short time memory, which includes the following steps:

s1, acquiring basic data required by user short-term load prediction from the power system;

s2, clustering users with similar electricity consumption behaviors by using a dynamic time warping method according to historical load data of the users;

s3, pooling the user data of the same category;

s4, selecting load data, meteorological data and forecast date type data of the forecast previous day, week and month, preprocessing the data and inputting the preprocessed data;

s5, constructing a short-term load prediction method based on the deep long short-term memory cyclic neural network, and verifying the effectiveness of the method by predicting the scene 24h in advance.

Preferably, the basic data in S1 includes historical load data and meteorological data, the historical load data is load data of 1h interval every day of the historical day, and the meteorological data at least includes the environmental temperature at the predicted time and the predicted day date type.

Preferably, S2 specifically includes the following steps:

s21, assuming that the load sequences of the two users are X, Y, where X ═ X ₁ ,x ₂ ,...x _N }、Y＝{y ₁ ,y ₂ ,...y _N Define a regular path matrix M [24 x 24 ]]The matrix elements (m, n) represent x _m And y _n Distance between two points, the value of matrix element (m, n) is d (x) _m ,y _n )＝(x _m -y _n ) ² ；

S22, defining dynamic warping path sequence P ═ (P) ₁ ,p ₂ ,...p _k ) Wherein p is _k ＝(m _k ,n _k ) And max (m, n) is more than or equal to K and less than m + n-1;

s23 total cost of dynamic warping between load sequences X and Y

Dynamic time warping distance DTW (X, Y) ═ c, defining X and Y _P (X, Y), wherein,

P ^* ＝argminc _P (X, Y), the weekly average DTW distance of the two users is used as the clustering distance in the lower clustering algorithm to obtain

Wherein t represents the day of the week, t is 0 to 6, X represents the period from Monday to Sunday _t And Y _t The load curves of the t days of the two users are respectively shown.

S24, a series of daily load curves X and the cluster number D of the users are given, the users are clustered into D classes according to the DTW distance of the load curves, and the class D and the attribution function C are obtained _d And a clustering center λ _d So as to make cluster interior and (S) _c ) At minimum, the calculation formula is

Then, randomly selecting a clustering core, and performing iterative updating by adopting a K-medoid-based clustering algorithm.

Preferably, the dynamic warping path sequence in S22 satisfies the following condition:

boundary condition, p ₁ Is ═ 1, and p _K ＝(N,N)；

Monotonicity Condition, m ₁ ≤m ₂ ≤...≤m _K And n is ₁ ≤n ₂ ≤...≤n _K ；

Step size limiting condition, if p _k-1 ＝(a',b')，The next point p of the path _k The requirement for (a, b) is that (a-a ') is 1 or less and (b-b') is 1 or less.

Preferably, S3 specifically includes the following steps:

s31, adding an ID tag to the user in a virtual variable mode;

s32, dividing user data into a training set and a test set;

and S33, combining all training data to construct a training pool, and then constructing a test pool through the same process.

Preferably, S4 specifically includes the following steps:

s41, normalizing the resident user load data set L by the normalization formula

Wherein,

normalized data value for a variable, x (i) raw data for the variable, x _max 、x _min Respectively the maximum value and the minimum value of the original data;

s42, conducting sparse processing on the meteorological factor data set W and the date type I of the user by using one-hot codes, and obtaining an input data set of the user, wherein the input data set is L, W and I;

and S43, pre-training the historical data of the previous day, the previous week and the previous month, then using the pre-training as the input of the prediction model, specifically processing the data of three days in a single-layer full-connection network mode, and finally outputting the data with the same dimension as the input data set of a single day.

Preferably, the method for constructing a short-term load prediction method based on a deep long-short term memory cycle neural network in S5, wherein the structure of the deep long-short term memory cycle neural network comprises: an input layer, a hidden layer and an output layer; the input layer is composed of neurons representing inputs; the hidden layer consists of neurons representing intermediate variables, and the hidden layer and the next hidden layer are directly connected by the neurons; the output layer is composed of neurons representing output results.

Preferably, the method for constructing the short-term load prediction based on the deep long-short-term memory recurrent neural network described in S5 specifically includes the following steps:

s51, carrying out batch division on the user training data in the S4 so as to facilitate calculation;

s52, initializing various parameters of the long-term and short-term memory cyclic neural training network;

s53, inputting training data to perform forward propagation training;

and S54, determining a loss function and an optimization method according to the predicted value and the true value to perform back propagation optimization.

Preferably, in S5, the average absolute percentage error and the mean square error are used as the evaluation indexes of the model prediction effect, and the calculation formulas are respectively,

wherein n is the number of predicted points, y _i For the load true value of the ith predicted point,

and predicting the value for the ith predicted point.

Compared with the prior art, the invention has the advantages that:

according to the load prediction method based on dynamic time warping and long-and-short time memory, aiming at the characteristic that the base number of the user to be predicted is large, the users with similar electricity utilization behaviors are clustered based on the DTW similarity, the prediction efficiency is improved, the clustering accuracy of the users is increased, the number of clustering categories is reduced, and the complexity of a clustering algorithm is reduced.

Meanwhile, aiming at the problem that the LSTM load prediction method based on deep learning is easy to over-fit, the diversity of training data is increased and the generalization capability of the model is improved by performing pooling processing on data in the same category, so that the short-term load prediction precision is improved, and the method has certain engineering application significance.

In addition, the invention also provides reference for other related problems in the same field, can be expanded and extended on the basis of the reference, is applied to the technical scheme of other load prediction methods in the same field, and has very wide application prospect.

The following detailed description of the embodiments of the present invention is provided in connection with the accompanying drawings for the purpose of facilitating understanding and understanding of the technical solutions of the present invention.

Drawings

FIG. 1 is a flow chart of a prediction method of the present invention;

FIG. 2 is a load prediction network architecture model based on LSTM;

FIG. 3 is a diagram of the internal structure of an LSTM neuron;

FIG. 4 is a clustering result of original user data;

fig. 5 is a graph showing the predicted effect on 8 months and 8 days.

Detailed Description

As shown in fig. 1, the invention discloses a load prediction method based on dynamic time warping and long-and-short time memory, and the idea of the invention is to cluster power users based on dynamic time warping similarity, and cluster users with similar electricity utilization habits into one class. And then carrying out random pooling operation on users of the same category, and increasing the scale and diversity of training data so as to increase the generalization capability of the prediction model. And then, respectively establishing a load prediction model based on long-time and short-time memory for the user data sets after the pooling, wherein the proposed method has better engineering adaptability.

Because the load curve of the user has the characteristics of multilaterality and time delay, certain deviation exists in clustering the load users based on the Euclidean distance, and judgment cannot be carried out. Dynamic Time Warping (DTW, Dynamic Time Warping) is a method for measuring the similarity between two Time sequences with different lengths. The method has wide application, and is mainly used in template matching, such as isolated word speech recognition (whether two sections of speech represent the same word or not), gesture recognition, data mining, information retrieval and the like. Therefore, the invention carries out K-medoid-based clustering by calculating the DTW similarity of two user loads.

Specifically, the load prediction method based on dynamic time warping and long-and-short time memory comprises the following steps:

and S1, acquiring basic data required by user short-term load prediction from the power system.

The basic data comprises historical load data and meteorological data, the historical load data is load data with an interval of 1h every day in historical days, and the meteorological data at least comprises the environmental temperature at the prediction moment and the date type of the prediction day.

And S2, clustering the users with similar electricity utilization behaviors by using a dynamic time warping method according to the historical load data of the users.

The step S2 may be detailed as a clustering method based on DTW similarity according to the historical load data of the users, that is, the DTW similarity of each user load is calculated first, and then K-medoid-based clustering is performed on each user according to the DTW distance, specifically including the following steps:

s21, since the load data collected every day are all equal in length, it is assumed that the load sequences of two users are X, Y, where X is { X ═ X { (X) ₁ ,x ₂ ,...x _N }、Y＝{y ₁ ,y ₂ ,...y _N Define a regular path matrix M [24 x 24 ]]The matrix elements (m, n) represent x _m And y _n The distance between two points, here we mean the Euclidean distance, the value of the matrix element (m, n) is d (x) _m ,y _n )＝(x _m -y _n ) ² 。

S22, defining dynamic canonical path sequence P ═ (P) ₁ ,p ₂ ,...p _k ) Wherein p is _k ＝(m _k ,n _k ) And max (m, n) is more than or equal to K and less than m + n-1.

The dynamic warping path sequence satisfies the following conditions:

(1) boundary condition, p ₁ Is equal to (1,1), andp _K ＝(N,N)。

(2) monotonicity Condition, m ₁ ≤m ₂ ≤...≤m _K And n is ₁ ≤n ₂ ≤...≤n _K 。

(3) Step size limiting condition, if p _k-1 (a ', b') the next point p of the path _k The requirement for (a, b) is that (a-a ') is 1 or less and (b-b') is 1 or less. That is, a certain point can only be aligned with its adjacent point, and the alignment step cannot exceed 1.

S23, total cost of dynamic regulation between load sequences X and Y

Next, an optimal path P with the minimum cost needs to be found from the possible paths ^* . Thus we define the dynamic time warping distance of X and Y

Wherein, P ^* ＝arg min c _P (X, Y), however, since the load curve has the characteristic of high volatility, we take the weekly average DTW distance of two users as the clustering distance in the lower clustering algorithm to obtain

Wherein t represents the day of the week, t is 0 to 6, and represents the period from Monday to Sunday, X _t And Y _t The load curves of the t days of the two users are respectively shown.

S24, a series of daily load curves X and the clustering number D of the users are given, the users are clustered into D classes according to the DTW distance of the load curves, namely the target is to find the class D and the attribution function C _d And a clustering center λ _d So as to make cluster interior and (S) _c ) At minimum, the calculation formula is

The DTW similarity-based K-medoid clustering method comprises the following specific algorithm steps:

1. d data are randomly selected as particles (reference points) Oj (j is 1,2, …, D) in the user daily load curve sample X.

2. Repeatedly assigning the remaining sample points to the D cluster classes;

3. randomly selecting a non-particle sample Orandom; calculating the reference points of the exchange objects Orandom and O1, repeating the operation in the step 2, generating a new group of cluster classes, and calculating the target function S _c If S is _c If the value is less than 0, Orandom and O1 are exchanged, a new cluster class is reserved, and otherwise, the original central point and the cluster are reserved. This step is repeated until the D center points are no longer changed.

Because the clustered users in the same category have similar power utilization behaviors, the unified load prediction model is established by pooling the user data in the same category, so that the generalization capability of the model is enhanced, and the accuracy of the prediction model is improved.

And S3, performing pooling processing on the user data of the same type. In order to prevent overfitting of a load prediction algorithm based on deep learning and improve the generalization capability of a prediction model, pooling of users among classes completed by clustering in S2 is needed in S3, and overfitting is inhibited by increasing the scale of a training set, wherein S3 specifically comprises the following steps:

s31, adding I D tags to the user in the form of virtual variables.

And S32, dividing the user data into a training set and a testing set.

And S4, selecting load data, meteorological data and forecast date type data of the forecast previous day, week and month, and preprocessing the data to be used as input.

S4 specifically includes the following steps:

s41, carrying out normalization processing on the resident user load data set L, wherein the normalization formula is

Wherein,

normalized data value for a variable, x (i) raw data for the variable, x _max 、x _min The maximum and minimum values of the raw data, respectively.

And S42, performing sparse processing on the meteorological factor data set W and the date type I of the user by using one-hot codes to obtain an input data set of the user, wherein the input data set is L, W and I.

S43, because the load of the user presents obvious periodic characteristics, based on the consideration, historical data of the predicted day before, week before and month before are pre-trained, then the historical data are used as the input of a prediction model, the data of three days are processed in a single-layer full-connection network mode, and the final output is the same as the dimension of the input data set of a single day.

The structure of the deep long-short term memory circulation neural network comprises: an input layer, a hidden layer, and an output layer. The input layer is composed of neurons representing inputs. The hidden layer is composed of neurons representing intermediate variables, and the hidden layer and the next hidden layer are directly connected by the neurons. The output layer is composed of neurons representing output results.

The method for constructing the short-term load prediction based on the deep long-term and short-term memory cyclic neural network in the S5 specifically comprises the following steps:

and S51, dividing the user training data in the S4 into batches for calculation.

And S52, initializing various parameters of the long-term and short-term memory cyclic neural training network.

And S53, inputting training data to perform forward propagation training.

In S5, for the clustered users, the data are divided into training sets and test sets by a random pooling method, and prediction model training based on LSTM is performed on the training sets and the test sets, respectively. The average absolute percentage error (MAPE) and the mean square error (RMSE) are adopted as model prediction effect evaluation indexes, the calculation formulas are respectively,

is the predicted value of the ith predicted point.

Specifically, the load prediction structure model based on the LSTM method is shown in fig. 2, and the input layer includes load history data, meteorological factors, and date type.

A Recurrent Neural Network (RNN) is a Neural network with a feedback structure whose inputs are related not only to the current inputs and the weights of the network, but also to previous network inputs, and therefore, in theory, RNNs are well suited for processing sequence data. However, when the RNN learning is long-term dependent, the gradient disappears or the gradient explodes, which results in the model being untrained. To overcome this problem, Hochreiter et al proposed a long-term memory-cycling neural network (LSTM), which introduced the updating of cell states and three gate structure control information on cell states, enabling the long-term flow of information on the network.

Internal structure of LSTM neural network As shown in FIG. 3, LSTM defines and maintains an internal memory cell state, cell state C, throughout the cycle in order to establish a temporal connection _t Then through the forgetting door f _t And input gate i _t And an output gate o _t Three gate structure for updating and maintainingProtect or delete information within the cellular state. The forward calculation process is as follows:

f _t ＝σ(W _f g[h _t-1 ,x _t ]+b _f )，

i _t ＝σ(W _f g[h _t-1 ,x _t ]+b _i )，

o _t ＝σ(W _o g[h _t-1 ,x _t ]+b _o )，

h _t ＝o _t gtanh(C _t )，

wherein: c _t 、C _t-1 Respectively representing the cell states at the current time and the last time,

representing candidate states of input, f _t 、i _t 、o _t Respectively showing a forgetting gate, an input gate, an output gate, W _f 、W _i 、W _C 、W _o 、b _f 、b _i 、b _C 、b _o Sigmoid and hyperbolic tangent activation functions are represented, respectively. First, hide the layer output h at the last moment _t-1 And the current output x _t Calculating coefficients of the forgetting gate, the input gate and the output gate through the formula; then the hidden layer output h at the last moment _t-1 And the current output x _t Obtaining the candidate state of the current neuron by a formula

Then the last cell state C is determined by the forgetting gate and the input gate _t-1 And candidate state at the current time

The ratio of the hidden layer to the current cell state is calculated by a formula _t 。

And in the LSTM network training process, an error term of the output value and the true value of each LSTM neuron is reversely calculated by adopting a time back propagation algorithm, the gradient of each weight is calculated according to the corresponding error term, and the weight is updated by applying a gradient optimization algorithm.

The load prediction model based on the LSTM comprises an input layer, a hidden layer, an output layer, a network training module and an optimization module. The LSTM constructed in the present invention contains 3 hidden layers, 1 output layer. The LSTM network predicts the power load value at the ith time point, and needs to input the power load value at the previous L points as input, which is called as the sequence length. Because the power load data has the periodic characteristic, the invention aims to take the historical load data of the load prediction day, week and month before the load prediction day, week and month after being weighted by a full-connection network as the training data set T of the LSTM prediction network.

This stage of the method includes model training and model testing based on pooled load prediction:

in the training part, the deep recurrent neural network is trained by load data batches randomly taken from a user load pool, so that the LSTM network learns not only the load characteristics of a single user but also the common characteristics and uncertainty of loads in the same category.

In the test section, the test load curve is fed into the trained LSTM network. Assume that the load curve data set after data washing is Ψ 1 and the test households are listed in the set D ═ D1, D2. Next, the network configuration parameters of LSTM need to be determined, and we use L and H to represent the network depth (layer number) and the number of hidden units, respectively. With these parameters, the training and testing process can proceed as follows:

1) initializing an LSTM prediction network, and establishing network configuration parameters, namely network depth L, hidden layer number H and batch size C.

2) Network training iteration after the network is started, the program then runs a training iteration period until the network is well trained and the loss function of the neural network is the mean square error (RMSE) loss.

In each training period, training data is randomly selected from a training data pool and then input into a feedforward neural network for network training. Each training batch is two matrices of fixed size, namely an input matrix of size C × I and an output matrix of size C × O. The time cost and the number of iterations of the training process depend on the size J of the fed-back data sequence, the selected optimization method, the size of the network (M, I) and the size of the scale of one training (batch) C. To achieve a good balance between training efficiency and efficacy, the training batch size C is variable during the training process:

1) in the early period, in order to quickly approach the optimum point, C is set to a small value.

2) Then C is gradually increased towards better training performance but at the expense of time cost.

Test iteration and performance benchmark

Individual families were then tested for a trained deep recurrent neural network by predicting the neural network as a feed forward. In the testing process, load prediction is carried out on testing families one by one so as to determine whether the proposed method can singly realize the performance improvement of the load prediction. At each iteration, performance comparisons were made with other load prediction methods, including ARIMA, SVR, RNN and deep RNN, which were trained using only load data from the test home.

The following provides additional description of the above embodiments and verification of the advantages of the above embodiments by way of specific embodiments.

The invention adopts 23254 real-time load values of 1057 users provided by a certain urban power grid company as research objects. The sampling frequency of the load was collected every 1 hour for a period of 2016, 7 months and 19 days, to 2016, 8 months and 9 days, during which a 22-day load curve was recorded for each user.

According to the invention, firstly, DTW-based clustering is carried out on all users, and the users with similar behaviors are classified into one class. A good clustering algorithm should be such that the total distance (Sc) of the samples after clustering to the same class is as small as possible and the distance (WB) between the centers of the different classes is as large as possible. Therefore, we use the ratio γ of the two transverse criteria as a reference to evaluate the performance of the algorithm. Smaller γ indicates better clustering performance.

Wherein λ is _i Representing the center of the ith category. Fig. 4 shows the performance comparison between the DTW clustering algorithm and the K-means clustering and the EM algorithm, and it is obvious that the performance of the DTW clustering algorithm is always better than the other two regardless of the change of the number K of the clusters.

The effectiveness of the method provided by the invention is verified by predicting the actual load value of a certain power grid. And respectively establishing LSTM prediction models for the clustered data, and predicting 168 load values from 8 months, 2 days and 1 to 8 months, 9 days and 24 hours in advance.

The invention adopts an LSTM network structure with three hidden layers, the training batch size is 96, the number of neurons in the hidden layers is 24, the optimization algorithm is Adam optimization, the learning rate is 0.002, and the loss function is RMSE. The invention respectively establishes four load prediction models of ARIMA, SVR, LSTM and DTW-LSTM, and compares the prediction performances. FIG. 5 is a diagram of the prediction effect of 8 months and 8 days, and it can be known that the DTW-LSTM short-term load prediction model can better approach the true value and has better prediction accuracy.

Table I compares the proposed DTW-LSTM prediction model against the other three classical prediction algorithms in terms of MARE and RSE. All indices shown in the table are taken as average values for all test households. The LSTM algorithm has better performance than other conventional algorithms. And after the DTW clustering is introduced, the prediction precision is further improved. Specifically, the proposed DTW-LSTM increases MARE and RSE by 6.45% and 6.96%, respectively, compared to the conventional LSTM. Compared with ARIMA, MARE and RSE of DTW-LSTM are respectively improved by 19.46% and 16.28%, and the performance is improved more remarkably.

TABLE 1 comparison of load prediction Performance

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A load prediction method based on dynamic time warping and long-and-short time memory is characterized by comprising the following steps:

the basic data comprises historical load data and meteorological data, the historical load data is load data with an interval of 1h every day in historical days, and the meteorological data at least comprises an environmental temperature at a prediction moment and a prediction day and day type;

s2, clustering the users with similar electricity consumption behaviors by using a dynamic time warping method according to the historical load data of the users;

s3, pooling the user data of the same category, specifically comprising the following steps:

s31, adding an ID tag to the user in a virtual variable mode;

s32, dividing user data into a training set and a test set;

s33, combining all training data to construct a training pool, and then constructing a test pool through the same process;

s4, selecting load data, meteorological data and forecast date type data of the forecast day before, week before and month before, and preprocessing the data to be used as input;

2. The load prediction method based on dynamic time warping and long-and-short time memory as claimed in claim 1, wherein S2 specifically comprises the following steps:

S22, defining dynamic canonical path sequence P ═ (P) ₁ ,p ₂ ,...p _k ) Wherein p is _k ＝(m _k ,n _k ) And max (m, n) is more than or equal to K and less than m + n-1;

s23 total cost of dynamic warping between load sequences X and Y

Defining dynamic time warping distances for X and Y

Wherein, P ^* ＝arg min c _P (X, Y), the weekly average DTW distance of the two users is used as the clustering distance in the lower clustering algorithm to obtain

Wherein t represents the day of the week, t is 0 to 6, and represents the period from Monday to Sunday, X _t And Y _t Respectively representing the load curves of the t days of the two users;

s24, a series of daily load curves X and the clustering number D of the users are given, the users are clustered into D classes according to the DTW distance of the load curves, and the class D and the attribution function C are obtained _d And a clustering center λ _d So as to make cluster interior and (S) _c ) At minimum, the calculation formula is

3. The load prediction method based on dynamic time warping and long-and-short time memory as claimed in claim 2, wherein said dynamic warping path sequence in S22 satisfies the following condition:

boundary condition, p ₁ Is ═ 1,1), and p _K ＝(N,N)；

Step size limiting condition, if p _k-1 (a ', b') the next point p of the path _k The requirement for (a, b) is that (a-a ') is 1 or less and (b-b') is 1 or less.

4. The load prediction method based on dynamic time warping and long-and-short time memory as claimed in claim 1, wherein S4 specifically comprises the following steps:

Wherein,

and S43, pre-training the load data, meteorological data and predicted date and type data of the previous day, week and month, and then serving as the input of a prediction model, specifically processing the data of three days in a single-layer full-connection network mode, wherein the final output is the same as the dimension of the input data set of a single day.

5. The method for load prediction based on dynamic time warping and long-and-short term memory as claimed in claim 1, wherein the step of constructing a short-term load prediction method based on a deep long-and-short term memory recurrent neural network in S5, the structure of the deep long-and-short term memory recurrent neural network comprises: an input layer, a hidden layer and an output layer; the input layer is composed of neurons representing inputs; the hidden layer consists of neurons representing intermediate variables, and the hidden layer and the next hidden layer are directly connected by the neurons; the output layer is composed of neurons representing output results.

6. The load prediction method based on dynamic time warping and long-and-short-term memory as claimed in claim 1, wherein the step of constructing the short-term load prediction method based on the deep long-and-short-term memory recurrent neural network in S5 includes the following steps:

s51, dividing the user training data in the S4 into batches for calculation;

s52, initializing each parameter of the long-short term memory recurrent neural training network;

s53, inputting training data to perform forward propagation training;

7. The load prediction method based on dynamic time warping and long-and-short time memory as claimed in claim 1, wherein: in S5, the mean absolute percentage error and the mean square error are used as the evaluation indexes of the model prediction effect, and the calculation formulas are respectively,

wherein n is preNumber of measurement points, y _i For the true load value of the ith predicted point,

is the predicted value of the ith predicted point.