CN115619028A

CN115619028A - Clustering algorithm fusion-based power load accurate prediction method

Info

Publication number: CN115619028A
Application number: CN202211327749.8A
Authority: CN
Inventors: 孙朝霞; 吴冉; 凌在汛; 向慕超; 艾晗啸; 李旻; 郭雨; 韩鸿凌; 杨帆; 金晨; 焦海文; 沈骏杰
Original assignee: Hubei Fangyuan Dongli Electric Power Science Research Co ltd; Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd; Xiangyang Power Supply Co of State Grid Hubei Electric Power Co Ltd
Current assignee: Hubei Fangyuan Dongli Electric Power Science Research Co ltd; Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd; Xiangyang Power Supply Co of State Grid Hubei Electric Power Co Ltd
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-01-17

Abstract

The invention provides a clustering algorithm fusion-based power load accurate prediction method, which introduces a clustering algorithm and is combined with an improved multi-output ANN prediction model. Firstly, sample data is preprocessed, normalization processing is carried out on the sample data by using a maximum and minimum normalization method, then clustering is carried out on the samples by using a K-means clustering algorithm, data with similar characteristics are used as prediction input, the regularity of the samples is strengthened, and the prediction precision is improved; and then, a multi-output strategy is introduced to improve the traditional ANN prediction model, the fitting degree of the model is improved, the output is closer to an actual value, and the clustering algorithm is combined with the ANN prediction model to form a combined prediction model. Compared with the traditional ANN prediction model and the typical deep learning prediction model, the method provided by the invention improves the prediction precision and has good learning performance and adaptability. The method further improves the load prediction precision and has certain reference value.

Description

Clustering algorithm fusion-based power load accurate prediction method

Technical Field

The invention relates to the technical field of electric power, in particular to an accurate prediction method of an electric power load based on clustering algorithm fusion.

Background

The intelligent power grid integrates the existing digital networks for power generation, power transmission, power distribution, power sale, connection of various electrical equipment of terminal users and other energy utilization facilities to share information of the traditional power grid into an intelligent system, can effectively improve the access, consumption and coordination control levels of the power grid to new energy such as wind, light, water and the like, and promotes the optimization and adjustment of energy structures in China through the intellectualization of the power grid. However, with the development of smart grids, the energy supply of power systems is more diversified, and the service requirements are more complicated, and under this background, the difficulty of power load prediction is greater, and the process becomes more complicated. Early single load prediction methods failed to meet the requirements for prediction accuracy. Therefore, a more effective power load prediction technology is needed to realize the optimal planning of the power system and guarantee the economic and safe operation of the power system.

The early load prediction methods mainly include a regression prediction method, a time series method, a grey system theory, an exponential smoothing method and the like, the used models are simple, the calculation time is short, but the prediction accuracy is low, and the current complex and diversified load prediction scene is difficult to meet. With the popularization of machine learning and deep learning, the prediction method gradually departs from the original traditional prediction method, and the machine learning and deep learning technology is adopted to predict the load. The artificial neural network is a typical machine learning technology, can be used for describing a complex nonlinear mapping relation, can effectively process nonlinear and irregular problems, and further improves the prediction precision and the universality of a model by a prediction method based on machine learning and deep learning. However, if a single method is used for prediction, a satisfactory result can be difficult to obtain, and a single prediction model has large limitation and cannot adapt to the complex and multivariate load prediction scene. The invention provides a clustering method and a prediction model, and provides a novel combined prediction model by combining clustering and an improved ANN model.

Disclosure of Invention

The method provided by the invention is an accurate prediction method of the power load based on the fusion of the clustering algorithm, and the prediction method combines the clustering algorithm and an improved multi-output ANN prediction model to improve the prediction precision of the load prediction of the power system.

The invention is realized by the following technical scheme:

a clustering algorithm fusion based power load accurate prediction method comprises the following steps:

s1, preprocessing historical daily load data to obtain a data set, dividing the data set into a training set and a testing set to obtain the number of training samples, and performing normalization processing on sample data by adopting a maximum and minimum normalization method;

s2, classifying the data normalized in the step S1 by adopting a K-means clustering algorithm, aggregating the data with the same type of characteristics, determining an optimal clustering number K value according to an elbow criterion, dividing the data into a plurality of clusters, giving different cluster labels, calculating an average value of each cluster as a classification standard of a prediction day, classifying the load data to be detected in the prediction process, and then carrying out model training to obtain a trained ANN prediction model;

and S3, aiming at the test set, firstly finishing classification by using the clustering standard in the step S2, adding a cluster label, and then finishing prediction based on the ANN prediction model trained in the step S2.

Further, step S1 specifically includes:

s11: abnormal value judgment and correction are carried out on the data set, so that a more complete sample data set is obtained; the specific operation is as follows:

step1, calling a function np.isnan (dataset) to judge the abnormal value of the dataset;

step2, correcting the abnormal value by adopting an averaging method;

in the formula, x (d, i) represents the numerical value of the load with the serial number d at the ith time point, and the average value of the previous load value and the next load value is used for interpolation processing;

s12: data were scaled according to 9The set is divided into a training set and a test set, and 28992 sample data in the training set are assumed and recorded as X _train ＝(x _ij ) _1208×24 I denotes the ith day, j denotes the jth sampling point, and the total number of the test sets is X _test ＝(x _mn ) _134×24 M denotes the mth day, n denotes the nth sampling point;

s13: normalizing the data by adopting a maximum and minimum normalization method to eliminate the influence of dimension on the clustering effect, wherein the principle of the maximum and minimum normalization is to divide the original data by the absolute value of the maximum value;

wherein

Is a normalized data value, x is the actual value of the sample, x _max The maximum value of the sample is the value range of [ -1,1 ] of the data after conversion]。

Further, step S2 firstly adopts K-means clustering algorithm to train data set X _train ＝(x _ij ) _120×8 Classifying, and meanwhile, calculating the average value of each cluster as a classification standard for classifying the load to be measured during prediction; then, carrying out prediction model training, wherein the specific operation scheme is as follows:

s21: training data set X using K-means clustering algorithm _train ＝(x _ij ) _1208×24 Classifying;

the K-means algorithm process is as follows:

step1, inputting a sample set N, and randomly selecting K samples from N as an initial mean vector { u ₁ ,u ₂ ,...,u _k }；

Step2, calculating the distance from each sample to the initial center, and distributing each sample point to the nearest mass center according to the experimental nearest neighbor rule;

step3, updating the center of each cluster by using the sample mean value of each cluster;

step4, if the clustering gravity center is not changed any more or the maximum iteration times are reached, the best clustering result is found, the iteration is stopped, and a clustering label is output; otherwise, returning to Step2;

wherein mu _j The mean value for each cluster, i.e., the centroid of the cluster; obtaining an optimal clustering number K value by combining with an elbow criterion, obtaining curve characteristics of each cluster by combining with physical significance, adding cluster labels to different clusters according to the curve characteristics, and calculating an average value of load values of each cluster to serve as a classification standard of the daily load data to be detected;

s22, introducing a clustering algorithm and a multi-output strategy to improve a prediction model, and finishing training 28992 training set samples;

on one hand, a clustering algorithm is introduced, namely a clustering result of S21 is used for a training data set X _train ＝(x _ij ) _1208×24 Adding a new variable, namely a cluster label classified into a certain classification cluster every day, and taking the cluster label as a new input variable to form a new training data set X _t ' _rain ＝(x _ij ) _1208×25 Improving the prediction precision by introducing a new strong correlation variable;

on the other hand, a multi-output strategy is introduced, namely a model capable of predicting the whole time sequence at one time is designed, and the number of nodes of an output layer of the ANN prediction model is set to be 24, so that the 24-dimensional multi-output strategy of the prediction model is realized, and the relation between the historical data and the data to be measured is better represented;

based on the improved ANN prediction model, X is calculated _t ' _rain ＝(x _ij ) _1208×25 The normalization processing is carried out by the step S13 to obtain normalized data

Adopting a prediction method based on a time sequence, namely selecting load values seven days before a to-be-detected day and cluster label total (24 + 1) = 7=175 dimensional data after pretreatment as input of a prediction model, selecting load values of 24 sampling points of the to-be-predicted day as output, sequentially inputting all data in a training set for model training, and calculating a predicted value obtained by training and the load values of 24 sampling points of the to-be-predicted dayAnd the error of the true value is used for evaluating the quality of the training model, and the trained ANN prediction model is obtained by setting the target minimum error value until the minimum target error is trained.

Further, step S3 is to complete classification by using the classification standard of S2 for the test set, then complete prediction based on the trained improved prediction model, and also use a prediction method based on a time sequence, that is, predict the load data to be measured for the eighth day using the load data for the first seven days, and the specific implementation is as follows:

s31: firstly, comparing the load value of seven days before the day to be predicted according to the average value of each cluster obtained in the training stage, calculating the minimum mean square error of the load value, and taking the cluster with the minimum mean square error difference as the classification result of the day; the method comprises the following specific steps:

step1, acquiring load values of 24 sampling points at a certain day on a day to be measured, and enabling the cluster number to be k =0;

step2, calculating the mean square error MSE:

MSE _i ＝Σ(x _i -μ _i ) ²

MSE _i represents the ith cluster mean value μ _i The corresponding mean square error;

step3, after the mean square deviations corresponding to all cluster mean values are calculated, the minimum MSE is obtained _i The corresponding cluster number is i, and the load classification result of the day is marked as k = i;

s32, after the classification labels of the clusters are obtained, taking (24 + 1) = 7=175 dimensional data of the load data of 24 sampling points in seven days before the history and the clustering labels as the input of the improved ANN prediction model trained in S22, so as to obtain the load prediction result of the day to be measured, and performing denormalization on the prediction result by adopting the following method:

wherein

Indicates no classification after predictionAnd (4) normalizing the data, wherein Y is the normalized data and is restored to the original dimension.

Further, step S3 further includes:

s33, introducing a prediction evaluation index for evaluating prediction precision; respectively selecting an average absolute percentage error MAPE, a mean square error MSE, an average absolute error MAE and an error rate Wrong rate as test indexes, wherein the calculation method comprises the following steps:

in the formula, y _i For load prediction value, x _i N represents the total number of samples as the actual value of the load; the average absolute error MAE and the mean square error MSE represent the average difference between the predicted value and the actual value; the smaller the value is, the smaller the difference between the predicted value and the actual value is, and the closer the predicted value is to the actual value; when E is _MAPE When the error of the sample is larger than 1, the error is regarded as a curve of the erroneous prediction, and the curve is represented by Wrongate, and the accuracy of the prediction is evaluated by using these criteria.

According to the method, data with the same characteristics are used as prediction input, so that the regularity of the sample is strengthened, and the prediction precision is improved; then, a multi-output strategy is introduced to improve a traditional ANN prediction model, the fitting degree of the model is improved, the output is closer to an actual value, and a clustering algorithm is combined with the ANN prediction model to form a combined prediction model; compared with the traditional ANN prediction model and the typical deep learning prediction model, the method improves the prediction accuracy, has good learning and adaptability, further improves the load prediction accuracy, and has a certain reference value.

Drawings

FIG. 1 is a flow chart of an implementation of a clustering algorithm fusion based power load accurate prediction method of the present invention;

FIG. 2 is a block diagram of an improved ANN prediction model of the present invention;

FIG. 3 is a graph of the loss function for the single-output and multiple-output models of the present invention;

FIG. 4 is a basic block diagram of an improved ANN prediction model based on a clustering algorithm;

FIG. 5 is a graph comparing an actual load curve of the present invention with predicted load curves of various predictive models;

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely below with reference to the chart in the embodiments of the present invention and with reference to the global energy competition load prediction event data set as a prediction case, and it is obvious that the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The invention provides a clustering algorithm fusion-based accurate power load prediction method, as shown in figure 1, a flow chart of the method is provided, the prediction method combines a clustering algorithm and an improved multi-output ANN prediction model to improve the prediction accuracy of power system load prediction, and the method comprises the following steps:

s1, preprocessing historical daily load data to obtain a data set, dividing the data set into a training set and a testing set to obtain the number of training samples, and performing normalization processing on sample data by adopting a maximum and minimum normalization method; s1 specifically comprises the following steps:

s11: for the load prediction competition of the 2012 global energy competition, the data set of the competition item includes historical daily load data of 24 sampling points from 1/2004 to 30/6/2008, and is preprocessed in 20 regions, so as to obtain 32208 more complete sample data sets, and the preprocessing steps of the data set are as follows:

step1, calling a function np.isnan (dataset) to perform abnormal value judgment on the dataset;

step2, correcting the abnormal value by adopting an averaging method;

s12: dividing the data set into a training set and a test set according to a proportion of 9 _train ＝(x _ij ) _1208×24 I denotes the ith day and j denotes the jth sample point. Test set totaling X _test ＝(x _mn ) _134×24 M denotes the mth day, n denotes the nth sampling point;

wherein

Is the normalized data value, x is the actual value of the sample, x _max The maximum value of the sample is the range of [ -1,1 ] of the data after conversion]。

S2, classifying the normalized data by adopting a K-means clustering algorithm, aggregating the data with the same kind of characteristics, determining an optimal clustering number K value according to an elbow criterion, dividing the data into a plurality of clusters, giving different cluster labels, and calculating an average value of each cluster for classifying the load to be measured during prediction;

specifically, step S2 is to adopt K-means clustering algorithm to train data set X _train ＝(x _ij ) _1208×24 Classifying (28992 samples in total), calculating the average value of each cluster as a classification standard, classifying the load to be tested during prediction, and then performing model training, wherein the specific operation scheme is as follows:

s21: training data set X using, but not limited to, K-means clustering algorithm _train ＝(x _ij ) _1208×24 Classifying;

the K-means algorithm proceeds as follows:

step1, inputting a sample set N, and randomly selecting K samples from N as an initial mean vector { u } ₁ ,u ₂ ,...,u _k }；

and S22, introducing a clustering algorithm and a multi-output strategy improved prediction model, finishing training of 28992 training set samples, and obtaining a trained improved ANN prediction model.

On one hand, a clustering algorithm is introduced, namely a clustering result of S21 is added to the training numberData set X _train ＝(x _ij ) _1208×24 Adding a list of new variables, namely cluster class labels which are classified into a certain classification cluster every day, and taking the cluster class labels as new input variables to form a new training data set X' _train ＝(x _ij ) _1208×25 Improving the prediction precision by introducing a new strong correlation variable;

on the other hand, a multi-output strategy is introduced, namely a model capable of predicting the whole time sequence at one time is designed, the traditional machine learning algorithm cannot select multi-dimensional data to output, and a method of direct prediction and recursive prediction is used, the ANN prediction model used by the method can break the limitation, the 24-dimensional multi-output strategy of the prediction model is realized by setting the number of nodes of the model output layer to be 24, and compared with a single-output structure, the multi-output structure has a better fitting effect when the number of iterations is large, as shown in FIG. 2; the overall improved ANN prediction model structure is shown in FIG. 3, and an Adam optimizer is selected by the optimizer; the activation function selects a ReLu function, the problem of gradient disappearance is solved in a positive region, a block domain sigmoid and a tanh function with a far convergence speed comprise three hidden layers, and the number of nodes is respectively set to 300, 200 and 100;

based on the improved ANN prediction model, adding X' _train ＝(x _ij ) _1208×25 The normalization processing is carried out by the step S13 to obtain normalized data

And (3) adopting a prediction method based on a time sequence, namely selecting load values of seven days before the day to be detected and cluster labels (24 + 1) = 7=175 dimensional data as input of a prediction model after preprocessing, taking load values of 24 sampling points on the day to be predicted as output, sequentially inputting all data in a training set to perform model training, calculating errors between predicted values and true values obtained by training to evaluate the quality of the training model, and setting a target minimum error value until the minimum target error is obtained by training to obtain the trained ANN prediction model.

S3, aiming at the test set, firstly finishing classification by using the clustering standard in the step S2, adding cluster labels, then finishing prediction based on the ANN prediction model trained in the step S2, wherein a rolling prediction method based on a time sequence is adopted, the load value 24 hours on the eighth day to be measured is predicted by using the load value seven days before the day to be measured and the cluster labels, and on the basis of the predicted load value, completing prediction on the next day to be measured according to the load value seven days before the next day to be predicted and the cluster labels, and so on; setting inputs as load values seven days before the day to be tested and 175 input nodes in total of the cluster labels (24 + 1) × 7, introducing a multi-output strategy, setting outputs as load values at 24 time points in the day and 24 nodes in total, and obtaining the prediction result of the test set according to the process, wherein the specific implementation steps are as follows:

s31: firstly, comparing the load values of seven days before the day to be predicted according to the average value of each cluster obtained in the training stage, calculating the minimum mean square error of the load values, and taking the cluster with the minimum mean square error difference as the classification result of the day; the method comprises the following specific steps:

step2, calculating the mean square error MSE:

MSE _i ＝∑(x _i -μ _i ) ²

s32, after the classification labels of the clusters are obtained, the load data of 24 sampling points on seven days before the history and the clustering labels, namely (24 + 1) = 7=175 dimensional data, are used as the input of an improved ANN prediction model trained in S22, so that the load prediction result of the day to be measured is obtained, and the prediction result is subjected to inverse normalization by adopting the following method:

wherein

Expressing data which is not normalized after prediction, wherein Y is the data after normalization and is restored to the original dimension;

in the formula, y _i For load prediction value, x _i N represents the total number of samples as the actual value of the load; the Mean Absolute Error (MAE) and Mean Square Error (MSE) represent the average difference between the predicted value and the actual value; the smaller the value is, the smaller the difference between the predicted value and the actual value is, and the closer the predicted value is to the actual value; when E is _MAPE When the error of the sample is larger than 1, the error is regarded as a curve of the erroneous prediction, and the curve is represented by Wrongate, and the accuracy of the prediction is evaluated by using these criteria.

Three machine learning models, namely an ANN model, an LSTM model and a K-means-ANN model, are constructed in the specific case of the invention, and the parameter configuration of each model is shown in the following table 1.

TABLE 1 comparison of predicted results

Fig. 5 shows the comparison of the case data prediction result of the novel combined prediction method, the traditional ANN prediction model and the typical deep learning prediction model, and the random selection of sample data, which verifies that the novel combined prediction model improves the prediction precision and has good learning and adaptability.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A clustering algorithm fusion based power load accurate prediction method is characterized by comprising the following steps:

s2, classifying the data normalized in the step S1 by adopting a K-means clustering algorithm, aggregating the data with the same type characteristics, determining an optimal clustering number K value according to an elbow criterion, dividing the data into a plurality of clusters, giving different cluster labels, calculating an average value of each cluster as a classification standard of a prediction day, classifying the load data to be detected in the prediction process, and then performing model training to obtain a trained ANN prediction model;

2. The clustering algorithm fusion-based power load accurate prediction method according to claim 1, wherein the step S1 specifically comprises:

step2, correcting the abnormal value by adopting an averaging method;

s12: dividing the data set into a training set and a test set according to a proportion of 9 _train ＝(x _ij ) _1208×24 I denotes the ith day, j denotes the jth sampling point, and the total number of the test sets is X _test ＝(x _mn ) _134×24 M denotes the mth day, n denotes the nth sampling point;

wherein

Is a normalized data value, x is the actual value of the sample, x _max The maximum value of the sample is the range of [ -1,1 ] of the data after conversion]。

3. According to claim 2The method for accurately predicting the power load based on the clustering algorithm fusion is characterized in that step S2 is to adopt a K-means clustering algorithm to carry out an X training data set _train ＝(x _ij ) _1208×24 Classifying, and meanwhile, calculating the average value of each cluster as a classification standard for classifying the load to be measured during prediction; then, carrying out prediction model training, wherein the specific operation scheme is as follows:

the K-means algorithm process is as follows:

step4, if the clustering gravity center is not changed any more or the maximum iteration times are reached, the best clustering result is found, iteration is stopped, and a clustering label is output; otherwise, returning to Step2;

s22, introducing a clustering algorithm and a multi-output strategy improved prediction model to finish training 28992 training set samples;

on one hand, a clustering algorithm is introduced, namely a clustering result of S21 is used for a training data set X _train ＝(x _ij ) _1208×24 Adding a list of new variables, namely cluster class labels which are classified into a certain classification cluster every day, and taking the cluster class labels as new input variables to form a new training data set X' _train ＝(x _ij ) _1208×25 The prediction precision is improved by introducing a new strong correlation variable;

on the other hand, a multi-output strategy is introduced, namely a model capable of predicting the whole time sequence at one time is designed, and the number of nodes of an output layer of the ANN prediction model is set to be 24, so that the 24-dimensional multi-output strategy of the prediction model is realized, and the relation between historical data and data to be measured is represented better;

A prediction method based on a time sequence is adopted, namely load values of seven days before a to-be-detected day and cluster labels (24 + 1) = 7=175 dimensional data are selected as input of a prediction model after pretreatment, load values of 24 sampling points of the to-be-predicted day are selected as output, all data in a training set are input in sequence for model training, errors of predicted values and actual values obtained through training are calculated to be used for evaluating the quality of the training model, and the trained ANN prediction model is obtained by setting a target minimum error value until the minimum target error is obtained through training.

4. The method for accurately predicting the power load based on the fusion of the clustering algorithms according to claim 3, wherein in the step S3, for the test set, classification is firstly completed by using the classification standard of S2, then prediction is completed based on a trained improved prediction model, and a prediction method based on a time sequence is also adopted, namely, the load data of the day to be measured on the eighth day is predicted by using the load data of the first seven days, and the method is implemented as follows:

step2, calculating the mean square error MSE:

MSE _i ＝∑(x _i -μ _i ) ²

step3, after calculating the mean square error corresponding to all cluster mean values, acquiring the minimum MSE _i The corresponding cluster number is i, and the load classification result of the day is marked as k = i;

wherein

Representing the data which is not normalized after prediction, and Y is the data after normalization and is restored to the original dimension.

5. The clustering algorithm fusion based power load accurate prediction method according to claim 4, wherein the step S3 further comprises: