CN115619028A - Clustering algorithm fusion-based power load accurate prediction method - Google Patents

Clustering algorithm fusion-based power load accurate prediction method Download PDF

Info

Publication number
CN115619028A
CN115619028A CN202211327749.8A CN202211327749A CN115619028A CN 115619028 A CN115619028 A CN 115619028A CN 202211327749 A CN202211327749 A CN 202211327749A CN 115619028 A CN115619028 A CN 115619028A
Authority
CN
China
Prior art keywords
prediction
value
data
load
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211327749.8A
Other languages
Chinese (zh)
Inventor
孙朝霞
吴冉
凌在汛
向慕超
艾晗啸
李旻
郭雨
韩鸿凌
杨帆
金晨
焦海文
沈骏杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Fangyuan Dongli Electric Power Science Research Co ltd
Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Xiangyang Power Supply Co of State Grid Hubei Electric Power Co Ltd
Original Assignee
Hubei Fangyuan Dongli Electric Power Science Research Co ltd
Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Xiangyang Power Supply Co of State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Fangyuan Dongli Electric Power Science Research Co ltd, Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd, Xiangyang Power Supply Co of State Grid Hubei Electric Power Co Ltd filed Critical Hubei Fangyuan Dongli Electric Power Science Research Co ltd
Priority to CN202211327749.8A priority Critical patent/CN115619028A/en
Publication of CN115619028A publication Critical patent/CN115619028A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention provides a clustering algorithm fusion-based power load accurate prediction method, which introduces a clustering algorithm and is combined with an improved multi-output ANN prediction model. Firstly, sample data is preprocessed, normalization processing is carried out on the sample data by using a maximum and minimum normalization method, then clustering is carried out on the samples by using a K-means clustering algorithm, data with similar characteristics are used as prediction input, the regularity of the samples is strengthened, and the prediction precision is improved; and then, a multi-output strategy is introduced to improve the traditional ANN prediction model, the fitting degree of the model is improved, the output is closer to an actual value, and the clustering algorithm is combined with the ANN prediction model to form a combined prediction model. Compared with the traditional ANN prediction model and the typical deep learning prediction model, the method provided by the invention improves the prediction precision and has good learning performance and adaptability. The method further improves the load prediction precision and has certain reference value.

Description

Clustering algorithm fusion-based power load accurate prediction method
Technical Field
The invention relates to the technical field of electric power, in particular to an accurate prediction method of an electric power load based on clustering algorithm fusion.
Background
The intelligent power grid integrates the existing digital networks for power generation, power transmission, power distribution, power sale, connection of various electrical equipment of terminal users and other energy utilization facilities to share information of the traditional power grid into an intelligent system, can effectively improve the access, consumption and coordination control levels of the power grid to new energy such as wind, light, water and the like, and promotes the optimization and adjustment of energy structures in China through the intellectualization of the power grid. However, with the development of smart grids, the energy supply of power systems is more diversified, and the service requirements are more complicated, and under this background, the difficulty of power load prediction is greater, and the process becomes more complicated. Early single load prediction methods failed to meet the requirements for prediction accuracy. Therefore, a more effective power load prediction technology is needed to realize the optimal planning of the power system and guarantee the economic and safe operation of the power system.
The early load prediction methods mainly include a regression prediction method, a time series method, a grey system theory, an exponential smoothing method and the like, the used models are simple, the calculation time is short, but the prediction accuracy is low, and the current complex and diversified load prediction scene is difficult to meet. With the popularization of machine learning and deep learning, the prediction method gradually departs from the original traditional prediction method, and the machine learning and deep learning technology is adopted to predict the load. The artificial neural network is a typical machine learning technology, can be used for describing a complex nonlinear mapping relation, can effectively process nonlinear and irregular problems, and further improves the prediction precision and the universality of a model by a prediction method based on machine learning and deep learning. However, if a single method is used for prediction, a satisfactory result can be difficult to obtain, and a single prediction model has large limitation and cannot adapt to the complex and multivariate load prediction scene. The invention provides a clustering method and a prediction model, and provides a novel combined prediction model by combining clustering and an improved ANN model.
Disclosure of Invention
The method provided by the invention is an accurate prediction method of the power load based on the fusion of the clustering algorithm, and the prediction method combines the clustering algorithm and an improved multi-output ANN prediction model to improve the prediction precision of the load prediction of the power system.
The invention is realized by the following technical scheme:
a clustering algorithm fusion based power load accurate prediction method comprises the following steps:
s1, preprocessing historical daily load data to obtain a data set, dividing the data set into a training set and a testing set to obtain the number of training samples, and performing normalization processing on sample data by adopting a maximum and minimum normalization method;
s2, classifying the data normalized in the step S1 by adopting a K-means clustering algorithm, aggregating the data with the same type of characteristics, determining an optimal clustering number K value according to an elbow criterion, dividing the data into a plurality of clusters, giving different cluster labels, calculating an average value of each cluster as a classification standard of a prediction day, classifying the load data to be detected in the prediction process, and then carrying out model training to obtain a trained ANN prediction model;
and S3, aiming at the test set, firstly finishing classification by using the clustering standard in the step S2, adding a cluster label, and then finishing prediction based on the ANN prediction model trained in the step S2.
Further, step S1 specifically includes:
s11: abnormal value judgment and correction are carried out on the data set, so that a more complete sample data set is obtained; the specific operation is as follows:
step1, calling a function np.isnan (dataset) to judge the abnormal value of the dataset;
step2, correcting the abnormal value by adopting an averaging method;
Figure BDA0003909219400000021
in the formula, x (d, i) represents the numerical value of the load with the serial number d at the ith time point, and the average value of the previous load value and the next load value is used for interpolation processing;
s12: data were scaled according to 9The set is divided into a training set and a test set, and 28992 sample data in the training set are assumed and recorded as X train =(x ij ) 1208×24 I denotes the ith day, j denotes the jth sampling point, and the total number of the test sets is X test =(x mn ) 134×24 M denotes the mth day, n denotes the nth sampling point;
s13: normalizing the data by adopting a maximum and minimum normalization method to eliminate the influence of dimension on the clustering effect, wherein the principle of the maximum and minimum normalization is to divide the original data by the absolute value of the maximum value;
Figure BDA0003909219400000022
wherein
Figure BDA0003909219400000023
Is a normalized data value, x is the actual value of the sample, x max The maximum value of the sample is the value range of [ -1,1 ] of the data after conversion]。
Further, step S2 firstly adopts K-means clustering algorithm to train data set X train =(x ij ) 120×8 Classifying, and meanwhile, calculating the average value of each cluster as a classification standard for classifying the load to be measured during prediction; then, carrying out prediction model training, wherein the specific operation scheme is as follows:
s21: training data set X using K-means clustering algorithm train =(x ij ) 1208×24 Classifying;
the K-means algorithm process is as follows:
step1, inputting a sample set N, and randomly selecting K samples from N as an initial mean vector { u 1 ,u 2 ,...,u k };
Step2, calculating the distance from each sample to the initial center, and distributing each sample point to the nearest mass center according to the experimental nearest neighbor rule;
step3, updating the center of each cluster by using the sample mean value of each cluster;
step4, if the clustering gravity center is not changed any more or the maximum iteration times are reached, the best clustering result is found, the iteration is stopped, and a clustering label is output; otherwise, returning to Step2;
wherein mu j The mean value for each cluster, i.e., the centroid of the cluster; obtaining an optimal clustering number K value by combining with an elbow criterion, obtaining curve characteristics of each cluster by combining with physical significance, adding cluster labels to different clusters according to the curve characteristics, and calculating an average value of load values of each cluster to serve as a classification standard of the daily load data to be detected;
s22, introducing a clustering algorithm and a multi-output strategy to improve a prediction model, and finishing training 28992 training set samples;
on one hand, a clustering algorithm is introduced, namely a clustering result of S21 is used for a training data set X train =(x ij ) 1208×24 Adding a new variable, namely a cluster label classified into a certain classification cluster every day, and taking the cluster label as a new input variable to form a new training data set X t ' rain =(x ij ) 1208×25 Improving the prediction precision by introducing a new strong correlation variable;
on the other hand, a multi-output strategy is introduced, namely a model capable of predicting the whole time sequence at one time is designed, and the number of nodes of an output layer of the ANN prediction model is set to be 24, so that the 24-dimensional multi-output strategy of the prediction model is realized, and the relation between the historical data and the data to be measured is better represented;
based on the improved ANN prediction model, X is calculated t ' rain =(x ij ) 1208×25 The normalization processing is carried out by the step S13 to obtain normalized data
Figure BDA0003909219400000031
Adopting a prediction method based on a time sequence, namely selecting load values seven days before a to-be-detected day and cluster label total (24 + 1) = 7=175 dimensional data after pretreatment as input of a prediction model, selecting load values of 24 sampling points of the to-be-predicted day as output, sequentially inputting all data in a training set for model training, and calculating a predicted value obtained by training and the load values of 24 sampling points of the to-be-predicted dayAnd the error of the true value is used for evaluating the quality of the training model, and the trained ANN prediction model is obtained by setting the target minimum error value until the minimum target error is trained.
Further, step S3 is to complete classification by using the classification standard of S2 for the test set, then complete prediction based on the trained improved prediction model, and also use a prediction method based on a time sequence, that is, predict the load data to be measured for the eighth day using the load data for the first seven days, and the specific implementation is as follows:
s31: firstly, comparing the load value of seven days before the day to be predicted according to the average value of each cluster obtained in the training stage, calculating the minimum mean square error of the load value, and taking the cluster with the minimum mean square error difference as the classification result of the day; the method comprises the following specific steps:
step1, acquiring load values of 24 sampling points at a certain day on a day to be measured, and enabling the cluster number to be k =0;
step2, calculating the mean square error MSE:
MSE i =Σ(x ii ) 2
MSE i represents the ith cluster mean value μ i The corresponding mean square error;
step3, after the mean square deviations corresponding to all cluster mean values are calculated, the minimum MSE is obtained i The corresponding cluster number is i, and the load classification result of the day is marked as k = i;
s32, after the classification labels of the clusters are obtained, taking (24 + 1) = 7=175 dimensional data of the load data of 24 sampling points in seven days before the history and the clustering labels as the input of the improved ANN prediction model trained in S22, so as to obtain the load prediction result of the day to be measured, and performing denormalization on the prediction result by adopting the following method:
Figure BDA0003909219400000041
wherein
Figure BDA0003909219400000042
Indicates no classification after predictionAnd (4) normalizing the data, wherein Y is the normalized data and is restored to the original dimension.
Further, step S3 further includes:
s33, introducing a prediction evaluation index for evaluating prediction precision; respectively selecting an average absolute percentage error MAPE, a mean square error MSE, an average absolute error MAE and an error rate Wrong rate as test indexes, wherein the calculation method comprises the following steps:
Figure BDA0003909219400000043
Figure BDA0003909219400000044
Figure BDA0003909219400000045
Figure BDA0003909219400000046
in the formula, y i For load prediction value, x i N represents the total number of samples as the actual value of the load; the average absolute error MAE and the mean square error MSE represent the average difference between the predicted value and the actual value; the smaller the value is, the smaller the difference between the predicted value and the actual value is, and the closer the predicted value is to the actual value; when E is MAPE When the error of the sample is larger than 1, the error is regarded as a curve of the erroneous prediction, and the curve is represented by Wrongate, and the accuracy of the prediction is evaluated by using these criteria.
According to the method, data with the same characteristics are used as prediction input, so that the regularity of the sample is strengthened, and the prediction precision is improved; then, a multi-output strategy is introduced to improve a traditional ANN prediction model, the fitting degree of the model is improved, the output is closer to an actual value, and a clustering algorithm is combined with the ANN prediction model to form a combined prediction model; compared with the traditional ANN prediction model and the typical deep learning prediction model, the method improves the prediction accuracy, has good learning and adaptability, further improves the load prediction accuracy, and has a certain reference value.
Drawings
FIG. 1 is a flow chart of an implementation of a clustering algorithm fusion based power load accurate prediction method of the present invention;
FIG. 2 is a block diagram of an improved ANN prediction model of the present invention;
FIG. 3 is a graph of the loss function for the single-output and multiple-output models of the present invention;
FIG. 4 is a basic block diagram of an improved ANN prediction model based on a clustering algorithm;
FIG. 5 is a graph comparing an actual load curve of the present invention with predicted load curves of various predictive models;
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely below with reference to the chart in the embodiments of the present invention and with reference to the global energy competition load prediction event data set as a prediction case, and it is obvious that the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a clustering algorithm fusion-based accurate power load prediction method, as shown in figure 1, a flow chart of the method is provided, the prediction method combines a clustering algorithm and an improved multi-output ANN prediction model to improve the prediction accuracy of power system load prediction, and the method comprises the following steps:
s1, preprocessing historical daily load data to obtain a data set, dividing the data set into a training set and a testing set to obtain the number of training samples, and performing normalization processing on sample data by adopting a maximum and minimum normalization method; s1 specifically comprises the following steps:
s11: for the load prediction competition of the 2012 global energy competition, the data set of the competition item includes historical daily load data of 24 sampling points from 1/2004 to 30/6/2008, and is preprocessed in 20 regions, so as to obtain 32208 more complete sample data sets, and the preprocessing steps of the data set are as follows:
step1, calling a function np.isnan (dataset) to perform abnormal value judgment on the dataset;
step2, correcting the abnormal value by adopting an averaging method;
Figure BDA0003909219400000051
in the formula, x (d, i) represents the numerical value of the load with the serial number d at the ith time point, and the average value of the previous load value and the next load value is used for interpolation processing;
s12: dividing the data set into a training set and a test set according to a proportion of 9 train =(x ij ) 1208×24 I denotes the ith day and j denotes the jth sample point. Test set totaling X test =(x mn ) 134×24 M denotes the mth day, n denotes the nth sampling point;
s13: normalizing the data by adopting a maximum and minimum normalization method to eliminate the influence of dimension on the clustering effect, wherein the principle of the maximum and minimum normalization is to divide the original data by the absolute value of the maximum value;
Figure BDA0003909219400000052
wherein
Figure BDA0003909219400000061
Is the normalized data value, x is the actual value of the sample, x max The maximum value of the sample is the range of [ -1,1 ] of the data after conversion]。
S2, classifying the normalized data by adopting a K-means clustering algorithm, aggregating the data with the same kind of characteristics, determining an optimal clustering number K value according to an elbow criterion, dividing the data into a plurality of clusters, giving different cluster labels, and calculating an average value of each cluster for classifying the load to be measured during prediction;
specifically, step S2 is to adopt K-means clustering algorithm to train data set X train =(x ij ) 1208×24 Classifying (28992 samples in total), calculating the average value of each cluster as a classification standard, classifying the load to be tested during prediction, and then performing model training, wherein the specific operation scheme is as follows:
s21: training data set X using, but not limited to, K-means clustering algorithm train =(x ij ) 1208×24 Classifying;
the K-means algorithm proceeds as follows:
step1, inputting a sample set N, and randomly selecting K samples from N as an initial mean vector { u } 1 ,u 2 ,...,u k };
Step2, calculating the distance from each sample to the initial center, and distributing each sample point to the nearest mass center according to the experimental nearest neighbor rule;
step3, updating the center of each cluster by using the sample mean value of each cluster;
step4, if the clustering gravity center is not changed any more or the maximum iteration times are reached, the best clustering result is found, the iteration is stopped, and a clustering label is output; otherwise, returning to Step2;
wherein mu j The mean value for each cluster, i.e., the centroid of the cluster; obtaining an optimal clustering number K value by combining with an elbow criterion, obtaining curve characteristics of each cluster by combining with physical significance, adding cluster labels to different clusters according to the curve characteristics, and calculating an average value of load values of each cluster to serve as a classification standard of the daily load data to be detected;
and S22, introducing a clustering algorithm and a multi-output strategy improved prediction model, finishing training of 28992 training set samples, and obtaining a trained improved ANN prediction model.
On one hand, a clustering algorithm is introduced, namely a clustering result of S21 is added to the training numberData set X train =(x ij ) 1208×24 Adding a list of new variables, namely cluster class labels which are classified into a certain classification cluster every day, and taking the cluster class labels as new input variables to form a new training data set X' train =(x ij ) 1208×25 Improving the prediction precision by introducing a new strong correlation variable;
on the other hand, a multi-output strategy is introduced, namely a model capable of predicting the whole time sequence at one time is designed, the traditional machine learning algorithm cannot select multi-dimensional data to output, and a method of direct prediction and recursive prediction is used, the ANN prediction model used by the method can break the limitation, the 24-dimensional multi-output strategy of the prediction model is realized by setting the number of nodes of the model output layer to be 24, and compared with a single-output structure, the multi-output structure has a better fitting effect when the number of iterations is large, as shown in FIG. 2; the overall improved ANN prediction model structure is shown in FIG. 3, and an Adam optimizer is selected by the optimizer; the activation function selects a ReLu function, the problem of gradient disappearance is solved in a positive region, a block domain sigmoid and a tanh function with a far convergence speed comprise three hidden layers, and the number of nodes is respectively set to 300, 200 and 100;
based on the improved ANN prediction model, adding X' train =(x ij ) 1208×25 The normalization processing is carried out by the step S13 to obtain normalized data
Figure BDA0003909219400000071
And (3) adopting a prediction method based on a time sequence, namely selecting load values of seven days before the day to be detected and cluster labels (24 + 1) = 7=175 dimensional data as input of a prediction model after preprocessing, taking load values of 24 sampling points on the day to be predicted as output, sequentially inputting all data in a training set to perform model training, calculating errors between predicted values and true values obtained by training to evaluate the quality of the training model, and setting a target minimum error value until the minimum target error is obtained by training to obtain the trained ANN prediction model.
S3, aiming at the test set, firstly finishing classification by using the clustering standard in the step S2, adding cluster labels, then finishing prediction based on the ANN prediction model trained in the step S2, wherein a rolling prediction method based on a time sequence is adopted, the load value 24 hours on the eighth day to be measured is predicted by using the load value seven days before the day to be measured and the cluster labels, and on the basis of the predicted load value, completing prediction on the next day to be measured according to the load value seven days before the next day to be predicted and the cluster labels, and so on; setting inputs as load values seven days before the day to be tested and 175 input nodes in total of the cluster labels (24 + 1) × 7, introducing a multi-output strategy, setting outputs as load values at 24 time points in the day and 24 nodes in total, and obtaining the prediction result of the test set according to the process, wherein the specific implementation steps are as follows:
s31: firstly, comparing the load values of seven days before the day to be predicted according to the average value of each cluster obtained in the training stage, calculating the minimum mean square error of the load values, and taking the cluster with the minimum mean square error difference as the classification result of the day; the method comprises the following specific steps:
step1, acquiring load values of 24 sampling points at a certain day on a day to be measured, and enabling the cluster number to be k =0;
step2, calculating the mean square error MSE:
MSE i =∑(x ii ) 2
MSE i represents the ith cluster mean value μ i The corresponding mean square error;
step3, after the mean square deviations corresponding to all cluster mean values are calculated, the minimum MSE is obtained i The corresponding cluster number is i, and the load classification result of the day is marked as k = i;
s32, after the classification labels of the clusters are obtained, the load data of 24 sampling points on seven days before the history and the clustering labels, namely (24 + 1) = 7=175 dimensional data, are used as the input of an improved ANN prediction model trained in S22, so that the load prediction result of the day to be measured is obtained, and the prediction result is subjected to inverse normalization by adopting the following method:
Figure BDA0003909219400000072
wherein
Figure BDA0003909219400000073
Expressing data which is not normalized after prediction, wherein Y is the data after normalization and is restored to the original dimension;
s33, introducing a prediction evaluation index for evaluating prediction precision; respectively selecting an average absolute percentage error MAPE, a mean square error MSE, an average absolute error MAE and an error rate Wrong rate as test indexes, wherein the calculation method comprises the following steps:
Figure BDA0003909219400000081
Figure BDA0003909219400000082
Figure BDA0003909219400000083
Figure BDA0003909219400000084
in the formula, y i For load prediction value, x i N represents the total number of samples as the actual value of the load; the Mean Absolute Error (MAE) and Mean Square Error (MSE) represent the average difference between the predicted value and the actual value; the smaller the value is, the smaller the difference between the predicted value and the actual value is, and the closer the predicted value is to the actual value; when E is MAPE When the error of the sample is larger than 1, the error is regarded as a curve of the erroneous prediction, and the curve is represented by Wrongate, and the accuracy of the prediction is evaluated by using these criteria.
Three machine learning models, namely an ANN model, an LSTM model and a K-means-ANN model, are constructed in the specific case of the invention, and the parameter configuration of each model is shown in the following table 1.
TABLE 1 comparison of predicted results
Figure BDA0003909219400000085
Fig. 5 shows the comparison of the case data prediction result of the novel combined prediction method, the traditional ANN prediction model and the typical deep learning prediction model, and the random selection of sample data, which verifies that the novel combined prediction model improves the prediction precision and has good learning and adaptability.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (5)

1. A clustering algorithm fusion based power load accurate prediction method is characterized by comprising the following steps:
s1, preprocessing historical daily load data to obtain a data set, dividing the data set into a training set and a testing set to obtain the number of training samples, and performing normalization processing on sample data by adopting a maximum and minimum normalization method;
s2, classifying the data normalized in the step S1 by adopting a K-means clustering algorithm, aggregating the data with the same type characteristics, determining an optimal clustering number K value according to an elbow criterion, dividing the data into a plurality of clusters, giving different cluster labels, calculating an average value of each cluster as a classification standard of a prediction day, classifying the load data to be detected in the prediction process, and then performing model training to obtain a trained ANN prediction model;
and S3, aiming at the test set, firstly finishing classification by using the clustering standard in the step S2, adding a cluster label, and then finishing prediction based on the ANN prediction model trained in the step S2.
2. The clustering algorithm fusion-based power load accurate prediction method according to claim 1, wherein the step S1 specifically comprises:
s11: abnormal value judgment and correction are carried out on the data set, so that a more complete sample data set is obtained; the specific operation is as follows:
step1, calling a function np.isnan (dataset) to judge the abnormal value of the dataset;
step2, correcting the abnormal value by adopting an averaging method;
Figure FDA0003909219390000011
in the formula, x (d, i) represents the numerical value of the load with the serial number d at the ith time point, and the average value of the previous load value and the next load value is used for interpolation processing;
s12: dividing the data set into a training set and a test set according to a proportion of 9 train =(x ij ) 1208×24 I denotes the ith day, j denotes the jth sampling point, and the total number of the test sets is X test =(x mn ) 134×24 M denotes the mth day, n denotes the nth sampling point;
s13: normalizing the data by adopting a maximum and minimum normalization method to eliminate the influence of dimension on the clustering effect, wherein the principle of the maximum and minimum normalization is to divide the original data by the absolute value of the maximum value;
Figure FDA0003909219390000012
wherein
Figure FDA0003909219390000013
Is a normalized data value, x is the actual value of the sample, x max The maximum value of the sample is the range of [ -1,1 ] of the data after conversion]。
3. According to claim 2The method for accurately predicting the power load based on the clustering algorithm fusion is characterized in that step S2 is to adopt a K-means clustering algorithm to carry out an X training data set train =(x ij ) 1208×24 Classifying, and meanwhile, calculating the average value of each cluster as a classification standard for classifying the load to be measured during prediction; then, carrying out prediction model training, wherein the specific operation scheme is as follows:
s21: training data set X using K-means clustering algorithm train =(x ij ) 1208×24 Classifying;
the K-means algorithm process is as follows:
step1, inputting a sample set N, and randomly selecting K samples from N as an initial mean vector { u } 1 ,u 2 ,...,u k };
Step2, calculating the distance from each sample to the initial center, and distributing each sample point to the nearest mass center according to the experimental nearest neighbor rule;
step3, updating the center of each cluster by using the sample mean value of each cluster;
step4, if the clustering gravity center is not changed any more or the maximum iteration times are reached, the best clustering result is found, iteration is stopped, and a clustering label is output; otherwise, returning to Step2;
wherein mu j The mean value for each cluster, i.e., the centroid of the cluster; obtaining an optimal clustering number K value by combining with an elbow criterion, obtaining curve characteristics of each cluster by combining with physical significance, adding cluster labels to different clusters according to the curve characteristics, and calculating an average value of load values of each cluster to serve as a classification standard of the daily load data to be detected;
s22, introducing a clustering algorithm and a multi-output strategy improved prediction model to finish training 28992 training set samples;
on one hand, a clustering algorithm is introduced, namely a clustering result of S21 is used for a training data set X train =(x ij ) 1208×24 Adding a list of new variables, namely cluster class labels which are classified into a certain classification cluster every day, and taking the cluster class labels as new input variables to form a new training data set X' train =(x ij ) 1208×25 The prediction precision is improved by introducing a new strong correlation variable;
on the other hand, a multi-output strategy is introduced, namely a model capable of predicting the whole time sequence at one time is designed, and the number of nodes of an output layer of the ANN prediction model is set to be 24, so that the 24-dimensional multi-output strategy of the prediction model is realized, and the relation between historical data and data to be measured is represented better;
based on the improved ANN prediction model, adding X' train =(x ij ) 1208×25 The normalization processing is carried out by the step S13 to obtain normalized data
Figure FDA0003909219390000021
A prediction method based on a time sequence is adopted, namely load values of seven days before a to-be-detected day and cluster labels (24 + 1) = 7=175 dimensional data are selected as input of a prediction model after pretreatment, load values of 24 sampling points of the to-be-predicted day are selected as output, all data in a training set are input in sequence for model training, errors of predicted values and actual values obtained through training are calculated to be used for evaluating the quality of the training model, and the trained ANN prediction model is obtained by setting a target minimum error value until the minimum target error is obtained through training.
4. The method for accurately predicting the power load based on the fusion of the clustering algorithms according to claim 3, wherein in the step S3, for the test set, classification is firstly completed by using the classification standard of S2, then prediction is completed based on a trained improved prediction model, and a prediction method based on a time sequence is also adopted, namely, the load data of the day to be measured on the eighth day is predicted by using the load data of the first seven days, and the method is implemented as follows:
s31: firstly, comparing the load value of seven days before the day to be predicted according to the average value of each cluster obtained in the training stage, calculating the minimum mean square error of the load value, and taking the cluster with the minimum mean square error difference as the classification result of the day; the method comprises the following specific steps:
step1, acquiring load values of 24 sampling points at a certain day on a day to be measured, and enabling the cluster number to be k =0;
step2, calculating the mean square error MSE:
MSE i =∑(x ii ) 2
MSE i represents the ith cluster mean value μ i The corresponding mean square error;
step3, after calculating the mean square error corresponding to all cluster mean values, acquiring the minimum MSE i The corresponding cluster number is i, and the load classification result of the day is marked as k = i;
s32, after the classification labels of the clusters are obtained, the load data of 24 sampling points on seven days before the history and the clustering labels, namely (24 + 1) = 7=175 dimensional data, are used as the input of an improved ANN prediction model trained in S22, so that the load prediction result of the day to be measured is obtained, and the prediction result is subjected to inverse normalization by adopting the following method:
Figure FDA0003909219390000031
wherein
Figure FDA0003909219390000032
Representing the data which is not normalized after prediction, and Y is the data after normalization and is restored to the original dimension.
5. The clustering algorithm fusion based power load accurate prediction method according to claim 4, wherein the step S3 further comprises:
s33, introducing a prediction evaluation index for evaluating prediction precision; respectively selecting an average absolute percentage error MAPE, a mean square error MSE, an average absolute error MAE and an error rate Wrong rate as test indexes, wherein the calculation method comprises the following steps:
Figure FDA0003909219390000033
Figure FDA0003909219390000034
Figure FDA0003909219390000035
Figure FDA0003909219390000036
in the formula, y i For load prediction value, x i N represents the total number of samples as the actual value of the load; the average absolute error MAE and the mean square error MSE represent the average difference between the predicted value and the actual value; the smaller the value is, the smaller the difference between the predicted value and the actual value is, and the closer the predicted value is to the actual value; when E is MAPE When the error of the sample is larger than 1, the error is regarded as a curve of the erroneous prediction, and the curve is represented by Wrongate, and the accuracy of the prediction is evaluated by using these criteria.
CN202211327749.8A 2022-10-26 2022-10-26 Clustering algorithm fusion-based power load accurate prediction method Pending CN115619028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211327749.8A CN115619028A (en) 2022-10-26 2022-10-26 Clustering algorithm fusion-based power load accurate prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211327749.8A CN115619028A (en) 2022-10-26 2022-10-26 Clustering algorithm fusion-based power load accurate prediction method

Publications (1)

Publication Number Publication Date
CN115619028A true CN115619028A (en) 2023-01-17

Family

ID=84876334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211327749.8A Pending CN115619028A (en) 2022-10-26 2022-10-26 Clustering algorithm fusion-based power load accurate prediction method

Country Status (1)

Country Link
CN (1) CN115619028A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116153437A (en) * 2023-04-19 2023-05-23 乐百氏(广东)饮用水有限公司 Water quality safety evaluation and water quality prediction method and system for drinking water source

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116153437A (en) * 2023-04-19 2023-05-23 乐百氏(广东)饮用水有限公司 Water quality safety evaluation and water quality prediction method and system for drinking water source

Similar Documents

Publication Publication Date Title
CN108846517B (en) Integration method for predicating quantile probabilistic short-term power load
CN109919356B (en) BP neural network-based interval water demand prediction method
CN110163429B (en) Short-term load prediction method based on similarity day optimization screening
CN110969290B (en) Runoff probability prediction method and system based on deep learning
CN103105246A (en) Greenhouse environment forecasting feedback method of back propagation (BP) neural network based on improvement of genetic algorithm
CN110751318A (en) IPSO-LSTM-based ultra-short-term power load prediction method
CN110837915B (en) Low-voltage load point prediction and probability prediction method for power system based on hybrid integrated deep learning
CN111985719B (en) Power load prediction method based on improved long-term and short-term memory network
CN111967183A (en) Method and system for calculating line loss of distribution network area
CN113344288B (en) Cascade hydropower station group water level prediction method and device and computer readable storage medium
CN112733417A (en) Abnormal load data detection and correction method and system based on model optimization
CN112288137A (en) LSTM short-term load prediction method and device considering electricity price and Attention mechanism
CN111861013A (en) Power load prediction method and device
CN112990587A (en) Method, system, equipment and medium for accurately predicting power consumption of transformer area
CN115470862A (en) Dynamic self-adaptive load prediction model combination method
CN115619028A (en) Clustering algorithm fusion-based power load accurate prediction method
CN113033898A (en) Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network
CN117313795A (en) Intelligent building energy consumption prediction method based on improved DBO-LSTM
CN111310974A (en) Short-term water demand prediction method based on GA-ELM
CN117335425A (en) Tidal current calculation method based on GA-BP neural network
CN116578551A (en) GRU-GAN-based power grid data restoration method
CN116663419A (en) Sensorless equipment fault prediction method based on optimized Elman neural network
CN114169416B (en) Short-term load prediction method based on migration learning under small sample set
Gao et al. Establishment of economic forecasting model of high-tech industry based on genetic optimization neural network
CN115081551A (en) RVM line loss model building method and system based on K-Means clustering and optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination