CN110188919A

CN110188919A - A kind of load forecasting method based on shot and long term memory network

Info

Publication number: CN110188919A
Application number: CN201910325295.2A
Authority: CN
Inventors: 许贤泽; 施元
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2019-04-22
Filing date: 2019-04-22
Publication date: 2019-08-30

Abstract

The invention belongs to Techniques for Prediction of Electric Loads fields, disclose a kind of load forecasting method based on shot and long term memory network, comprising: acquisition target area forms raw data set in the power load data and corresponding weather characteristics data of certain time period；Missing values processing is carried out to raw data set using Spark cluster；Feature selecting is carried out to raw data set；Feature Compression is carried out to raw data set；Establish the prediction model based on shot and long term memory network；Distributed training is carried out to prediction model using Spark cluster；According to the weather characteristics data of previous time point and power load data, Distributed Predictive is carried out using prediction model, obtains the predicted load of current point in time.The present invention can solve is difficult under big data scene the problem of fast and efficiently carrying out electro-load forecast in the prior art, effectively can quickly be extracted, be handled and operation to large data sets.

Description

Load prediction method based on long-term and short-term memory network

Technical Field

The invention relates to the technical field of power load prediction, in particular to a load prediction method based on a long-term and short-term memory network.

Background

The load prediction problem is about predicting the power load required by a power enterprise at a specific future time, and is one of the core contents in power grid planning. The power enterprise forecasts the change situation and the development trend of the power load within a period of time in the future according to the historical data analysis of the load and the judgment of the future development trend. An accurate load forecast is crucial to the short-term scheduling arrangement and the long-term system planning of the power enterprise, and is the basis for making power supply planning, development planning, capital financial planning and the like.

By the end of 2017, the new energy grid-connected capacity in the national grid dispatching range reaches 2.80 hundred million kilowatts, wherein 14539 ten thousand kilowatts of wind power and 12083 ten thousand kilowatts of solar power generation are the first in the world; the accumulated installation of the intelligent electric meter exceeds 4 hundred million, and the automatic acquisition and full coverage of the electricity utilization information are basically realized. The data is growing day by day, and load prediction enters big data age, and load prediction and big data technology are amalgamated and produced to the practice that is imperative, have important strategic meaning to the development of electric power industry.

To address the challenges presented by massive amounts of data, "big data" has spawned a large number of distributed parallel computing and storage technologies based on computer technology. Based on the MapReduce programming framework and the Google File System developed by Google, the Hadoop project of Apache starts the era of enterprise-level big data processing, and the advent of various distributed computing and storage platforms surrounding Hadoop research and development comes with the Hadoop. With the demand for real-time computing becoming higher, the related stream processing engines represented by Spark Streaming and Flink have started a new wave of big data development.

In order to solve the problem of rapid and efficient power load prediction in a big data scene, a whole set of load prediction data processing and modeling scheme suitable for a big data processing scene needs to be provided.

Disclosure of Invention

The load prediction method based on the long-term and short-term memory network solves the problem that the power load prediction is difficult to rapidly and efficiently perform in a big data scene in the prior art.

The embodiment of the application provides a load prediction method based on a long-term and short-term memory network, which comprises the following steps:

s1, collecting power load data and corresponding weather characteristic data of a target area in a certain time period to form an original data set;

step S2, missing value processing is carried out on the original data set by utilizing a Spark cluster;

step S3, selecting the characteristics of the original data set;

step S4, performing feature compression on the original data set;

step S5, establishing a prediction model based on the long-term and short-term memory network;

step S6, performing distributed training on the prediction model by using the Spark cluster;

and S7, according to the weather characteristic data and the power load data of the previous time point, performing distributed prediction by using the prediction model to obtain a load prediction value of the current time point.

Preferably, the load prediction method based on the long-short term memory network further includes:

step S8: and storing the power load data and the weather characteristic data which are acquired in real time in an Hbase cluster, and displaying a load predicted value and a load real-time acquisition value.

Preferably, in step S1, the formed raw data set is stored into the Hbase cluster.

Preferably, in step S2, after K-means clustering is performed on the missing value by using the Spark cluster, a similar mean interpolation strategy is adopted for processing.

Preferably, in step S3, the pearson correlation between the electrical load data and the weather feature data is calculated, and feature importance ranking is performed by training a gradient boosting decision tree.

Preferably, feature variables with significance higher than 0.05 are removed, all variables are subjected to relevancy ranking, and the top 30% of features are taken as a first feature set; training by using a gradient lifting decision tree to obtain feature variable importance ranking, and taking the first 30% of features as a second feature set; and taking the intersection feature of the first feature set and the second feature set as the screened feature.

Preferably, in step S4, feature compression is performed on the weather feature data by using principal component analysis.

Preferably, in step S5, the parameter is updated by using a truncated propagation along the time direction.

Preferably, in step S6, based on the Spark cluster, a distributed training of data parallel is performed by using an asynchronous random gradient descent method.

Preferably, in step S7, the power load data and the weather characteristic data at the previous time point are acquired, the parallelism is set according to the data size and the cluster hardware information, and the load size is predicted by using the Spark cluster.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

in the embodiment of the application, a historical electric load data set and a historical weather characteristic data set are obtained, and data preprocessing, characteristic selection and characteristic compression are carried out; then, an asynchronous random gradient descent distributed training is carried out by using a cut-off type long-short term memory network which reversely propagates along time, a prediction model of the electricity utilization load is established, and electricity utilization amount prediction at a certain moment is carried out. The invention can effectively and rapidly extract, process and operate the big data set.

Drawings

In order to more clearly illustrate the technical solution in the present embodiment, the drawings needed to be used in the description of the embodiment will be briefly introduced below, and it is obvious that the drawings in the following description are one embodiment of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a data processing platform of a load prediction method based on a long-term and short-term memory network according to an embodiment of the present invention;

fig. 2 is a topological diagram of a long-short term memory network model in a load prediction method based on a long-short term memory network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the truncated backward propagation along time in the load prediction method based on the long-short term memory network according to the embodiment of the present invention;

fig. 4 is a topological diagram of Spark common mode training in the load prediction method based on the long-term and short-term memory network according to the embodiment of the present invention;

fig. 5 is a topological diagram of Spark grid pattern training in the load prediction method based on the long and short term memory network according to the embodiment of the present invention.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and specific embodiments.

The load prediction method based on the long-term and short-term memory network provided by the embodiment mainly comprises the following steps:

step S1: historical electricity load data of a certain area and related weather data of the same area and the same time period are collected to form an original data set, and the original data set is stored in an Hbase cluster.

Historical data (namely an original data set) is stored in the Hbase, and due to the characteristic that the Hbase level is well expanded, the requirement for large data volume can be met, the operation requirements for data persistence and reading are met, and meanwhile, interactive query operation can be performed by using Hive.

Step S2: and carrying out missing value processing operation on the data set by utilizing the Spark cluster.

By providing related information for the drive nodes of the Spark cluster, and using the computing nodes in the Spark cluster to load and persist the big data into the distributed memory of the cluster host in parallel, the data is abstracted into an elastic distributed data set in the Spark cluster and is computed based on the memory, so that the time overhead of data preprocessing is reduced. The data missing value is subjected to parallel similar mean value interpolation processing based on K-mean value clustering, and is closer to a true value compared with the common mean value interpolation.

Step S3: feature selection of the data is performed.

The method comprises the steps of calculating the Pearson correlation degree of each weather characteristic data and the power load data, training by using a gradient lifting decision tree to obtain characteristic variable importance ranking, and combining the two to perform characteristic screening, so that original data characteristics are reserved to the maximum extent, and training data volume is reduced.

Step S4: feature compression of the data is performed.

And the principal component is used for analyzing the compressed data, so that the input data volume of the algorithm model is reduced and the calculation speed of the model is improved while the characteristics of the original data are kept to the maximum extent.

Step S5: and establishing a model based on the long-term and short-term memory network.

The load prediction modeling is carried out on the compressed data by using the long-term and short-term memory network, so that the problem of gradient disappearance can be prevented. The method can improve the frequency of parameter updating, so that the neural network can be shaped more quickly with the same operational capability.

Step S6: the Spark cluster is used for distributed training of the model.

Based on Spark clusters, the asynchronous random gradient descent data parallelism training method is used, and the Spark cluster mode training network can greatly reduce the data volume of parameter updating between nodes, so that the model training speed is improved.

Step S7: and according to the power load data and the weather characteristic data of the previous time point, performing distributed prediction through the established long-term and short-term memory network model to obtain a load prediction value of the current time point.

Reading the power load data and the weather characteristic data at the previous moment, setting reasonable parallelism, predicting the load capacity by using a Spark cluster, and improving the prediction speed.

Step S8: and storing the power load data and the weather data acquired in real time in an Hbase cluster, and displaying the load predicted value and the load real-time acquired value (namely the real load value) on a Web end through a graphical interface.

And storing the latest power load data and the weather characteristic data which are acquired in real time into the Hbase cluster. The real-time real load value (i.e. the real-time load collection value) and the predicted load value are displayed together so as to observe errors and trends.

The present invention is described in further detail below.

Fig. 1 shows a block diagram of a data processing platform provided in the present invention, which includes: providing a distributed file system HDFS in Hadoop stored at the bottom layer; a calculation framework MapReduce for providing data operation interfaces for Hbase and Hive; cluster computing framework Spark; a message queue MQ that accepts real-time data. The Hbase is a distributed database, and the Hive is used for providing SQL type data operation for related workers.

The method synthesizes a real scene, considers factors such as weather factors, time factors, area relevance and the like, and predicts the electricity utilization load amount of a certain future area at a certain moment by mathematical modeling under the condition of ensuring a certain fault tolerance and precision.

step S1: and collecting historical electricity load data of a certain area and related weather data of the same area and the same time period in the Hbase cluster.

The original electric quantity data set and the weather data set are respectively stored in Hbase in advance, the Hbase can store massive data sets, and data columns of the data sets can be dynamically increased according to requirements as a column storage type NoSQL type database to meet the requirement of load characteristic data with multiple dimensions.

The meteorological characteristic missing values are subjected to K-means clustering processing to obtain a plurality of different data clusters, mean interpolation is carried out in the same data cluster, compared with common mean interpolation, the interpolated mean value is closer to the true value, and therefore the accuracy of the model is improved.

K-means is an iterative algorithm, and it is assumed that we want to cluster data into K groups, the method is that first, K random points are selected, called cluster centers; for each data in the data set, the data is characterized by a characteristic vector 2-norm according to the distance from each central point. It is associated with the closest central point and all points associated with the same central point are grouped into one category. The average value of each group is calculated and the center point associated with the group is moved to the position of the average value. The K-means minimization problem is to minimize the sum of the distances between all data points and their associated cluster center points, and the cost function of K-means is as follows:

wherein,representative and feature data vector x⁽ⁱ⁾The nearest cluster center point, the algorithm optimization goal is to find c which minimizes the cost function⁽¹⁾,…c^(m)And u₁,…,u_k。

The algorithm flow is as follows:

(1) randomly creating K points as an initial center;

(2) calculating the distance of each feature vector in the meteorological feature data set relative to each center;

(3) assigning the feature vector to its closest center;

(4) for the newly obtained groups, calculating the vector center of each group;

and (5) repeating the processes (2) to (4) until the algorithm converges.

The K-means algorithm cannot determine the number of clusters, and too much, although it makes the cost function smaller, will cause an overfitting of the data. The selection of a suitable cluster number is important to the accuracy of the mean interpolation. And calculating error values of the meteorological feature data under different types of quantity, and selecting the type quantity which enables the error reduction rate to be maximum as a classification basis of the type difference value. And respectively carrying out mean interpolation on the divided weather characteristic categories so as to finish the missing value processing work.

And step S3, selecting the characteristics of the data.

And 4, calculating the correlation degree of the Pearson, training a model by using a gradient lifting decision algorithm, and combining the two to select characteristics, so that the method has higher characteristic fault tolerance.

And (3) calculating the correlation degree and significance of the pearson, and calculating the correlation degree of the load quantity and each weather characteristic by using the following formula:

wherein,andare each X_iSample and Y_iAverage of samples. And removing the characteristic variables with the significance higher than 0.05, carrying out relevancy sorting on all the variables, and taking the top 30% of the characteristics as a characteristic set A.

And training by using a gradient lifting decision tree to obtain feature variable importance ranking, and taking the first 30% of features as a feature set B. And taking the intersection feature of the feature set A and the feature set B as the feature after screening.

And step S4, performing characteristic compression of the data.

Using principal component analysis dimensionality reduction, a d x k dimensional transformation matrix W is constructed, such that an eigenvector x can be mapped into a new k dimensional eigen-space, the dimensions of which are smaller than the original d dimensional eigen-space:

the principal component analysis algorithm flow is as follows:

(1) carrying out standardization processing on the original d-dimensional data set;

(2) constructing a sample covariance matrix;

(3) calculating an eigenvalue of the covariance matrix and a corresponding eigenvector;

(4) selecting eigenvectors corresponding to the first k largest eigenvalues, where k is the dimension of the new eigenspace (k < d);

(5) constructing a mapping matrix W through the previous feature vector;

(6) the d-dimensional input data set is converted to a new k-dimensional feature subspace by means of a mapping matrix W.

For the compressed time series data set in the Spark cluster, the following 8: and 2, splitting the data into a training characteristic data set and a test data set. Wherein the feature data set is used to train the model and the test data set is used to evaluate the model.

And step S5, establishing a circulating network model based on the long-term and short-term memory network.

Because the load per hour of 24h in the future is predicted, the time step of 24 of the long-short term memory network, that is, the output of the sequence of continuous 24h loads, is selected as a sample, and the specific structure is shown in fig. 2. The training characteristics are compressed weather characteristic data, and the training labels are load values of the next time point. By adopting a long-short term memory network structure, even if the time step is longer, the model training is not influenced by 'gradient disappearance'.

The long and short term memory network has high requirement on the setting calculation capacity when processing longer sequences, and a cut-off type rapid setting network is used for backward propagation along time. Taking the truncated backward propagation along time with length 4 in fig. 3 as an example, each backward propagation of the parameter update only passes through 4 time steps, so that the complexity of the parameter update in the network can be reduced. When inputting longer sequence data, the truncated time-reversal propagation algorithm can be adopted to increase the updating frequency of parameters, so that the neural network can be shaped more quickly with the same operational capability.

Step S6, carrying out distributed training on the model by using Spark clusters.

A distributed training scheme based on data-parallel asynchronous random gradient descent (asynchronous random gradient descent) is used. Defining the loss function as L, the gradient vector of the loss function for n parameters is:

the parameter vector W is, after i +1 th iteration of SGD at the learning rate a:

wherein, W_iAs a result of the ith iteration of the parameter vector,and training the gradient vector of the loss function obtained after the node data copy is trained for the jth computing node, wherein n is the number of the computing nodes.

Updating values of parameters in asynchronous random gradient descentIs applied to the vector of parameters when the calculation is completed, and does not need to strictly follow the period of updating the parameters. Asynchronous random gradient descent can achieve higher throughput in distributed systems: rather than waiting for the parameter averaging step to complete, the worker node may take more time to perform useful calculations. Second, the worker node merges information from other worker nodes faster than when using synchronous updates.

Performing distributed model training by using asynchronous random gradient descent based on Spark computing cluster, and using a common mode when the number of nodes in the cluster is less than 32, as shown in fig. 4; when the cluster size is large, the network mode is used, see fig. 5. The two training modes update the parameters between the nodes for coding and compressing, thereby reducing the communication traffic between the nodes and effectively improving the model training speed.

In the normal mode, quantized coded updates are passed by the worker node to the master node, which then propagates the updates to the remaining nodes. This may ensure that the master node always holds the latest version of the model. Meanwhile, reliable communication of the nodes can be guaranteed by means of a fault-tolerant mechanism of the Spark cluster, and if the Spark Master node fails, a new Master node can be elected by the cluster, so that the single-point problem is avoided.

The grid pattern is a multi-way tree, and the root node of the grid pattern is Spark Master. By default, each node may have a maximum of eight nodes, and the node tree within the Spark cluster may have a maximum of five levels. In network mode, each node relays the encoding updates to all nodes connected to it, and each node aggregates the updates received from all other nodes connected to it. In mesh mode, the Master node is no longer a bottleneck for performance because it receives less traffic directly.

And step S7, according to the power load data and the weather characteristic data of the previous time point, carrying out distributed prediction through the established long-term and short-term memory network model to obtain the load prediction quantity of the current time point.

And reading the power load data and the weather characteristic data at the last moment from the Hbase cluster, using a Spark cluster to predict the load amount in parallel, and setting reasonable data partition number and partition size according to the cluster condition to improve the prediction speed to the maximum extent.

And step S8, storing the real-time collected power load data and weather data in the Hbase cluster, and displaying the data on a Web end through a graphical interface.

And storing the load quantity and the weather characteristic data acquired in real time into Hbase for use in a later training model. And writing the prediction result into a message queue, receiving the load prediction value and the real value in the message queue in real time by using a network programming technology, and displaying in a line graph mode so as to observe errors and trends.

The load prediction method based on the long-term and short-term memory network provided by the embodiment of the invention at least comprises the following technical effects:

based on Hadoop, Spark and other big data processing technologies, K-means clustering is used for carrying out similar mean interpolation, and Pearson relevance and gradient boosting decision tree feature screening and principal component analysis are combined to carry out data processing operation of reducing dimension and compressing feature data, so that original data features are reserved to the greatest extent, the data volume of model training is greatly reduced, and the model training speed is accelerated. The distributed load modeling training is carried out on the data by using the long-term and short-term memory network, and compared with the traditional single machine training, the training process is greatly accelerated. By storing the training model on the distributed file system and calculating the cluster by means of Spark, the distributed parallel load prediction is completed, and the prediction time cost is reduced. The method can efficiently and accurately complete the electric load prediction in a big data scene. The method and the system predict the electricity utilization load of a certain area corresponding to a certain moment in the future under the condition of ensuring a certain fault tolerance rate and precision, and provide certain reference significance for relevant departments for dispatching the electricity resources.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A load prediction method based on a long-term and short-term memory network is characterized by comprising the following steps:

step S3, selecting the characteristics of the original data set;

step S4, performing feature compression on the original data set;

2. The long-short term memory network-based load prediction method of claim 1, further comprising:

3. The method for load prediction based on long-short term memory network as claimed in claim 1, wherein in step S1, the formed raw data set is stored into Hbase cluster.

4. The load prediction method based on the long-short term memory network as claimed in claim 1, wherein in step S2, after K-means clustering is performed on the missing values by using the Spark cluster, a homogeneous mean interpolation strategy is adopted for processing.

5. The load prediction method based on the long-short term memory network as claimed in claim 1, wherein in step S3, the pearson correlation degree between the electrical load data and the weather feature data is calculated, and the feature importance ranking is performed by training a gradient boosting decision tree.

6. The load prediction method based on the long-short term memory network as claimed in claim 5, characterized in that feature variables with significance higher than 0.05 are removed, all variables are subjected to relevancy ranking, and the top 30% of features are taken as a first feature set; training by using a gradient lifting decision tree to obtain feature variable importance ranking, and taking the first 30% of features as a second feature set; and taking the intersection feature of the first feature set and the second feature set as the screened feature.

7. The load prediction method based on the long-short term memory network as claimed in claim 1, wherein in step S4, the weather feature data is feature compressed by principal component analysis.

8. The method for load prediction based on long-short term memory network as claimed in claim 1, wherein in step S5, the parameters are updated by using the truncated propagation along the time direction.

9. The load prediction method based on the long-short term memory network as claimed in claim 1, wherein in step S6, based on the Spark cluster, a distributed training of data parallel is performed by using an asynchronous random gradient descent method.

10. The load prediction method based on the long and short term memory network as claimed in claim 1, wherein in step S7, the power load data and the weather characteristic data of the previous time point are obtained, and the load amount is predicted by using the Spark cluster according to the parallelism between the data amount and the cluster hardware information.