CN108734355B - Short-term power load parallel prediction method and system applied to power quality comprehensive management scene - Google Patents

Short-term power load parallel prediction method and system applied to power quality comprehensive management scene Download PDF

Info

Publication number
CN108734355B
CN108734355B CN201810506954.8A CN201810506954A CN108734355B CN 108734355 B CN108734355 B CN 108734355B CN 201810506954 A CN201810506954 A CN 201810506954A CN 108734355 B CN108734355 B CN 108734355B
Authority
CN
China
Prior art keywords
load
scene
data
class
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810506954.8A
Other languages
Chinese (zh)
Other versions
CN108734355A (en
Inventor
郭敬东
张健
黄道姗
张慧瑜
林芳
林焱
张伟骏
陈伯建
项胤兴
黄霆
徐振华
吴丹岳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Fujian Electric Power Co Ltd
State Grid Fujian Electric Power Co Ltd
Original Assignee
Electric Power Research Institute of State Grid Fujian Electric Power Co Ltd
State Grid Fujian Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Fujian Electric Power Co Ltd, State Grid Fujian Electric Power Co Ltd filed Critical Electric Power Research Institute of State Grid Fujian Electric Power Co Ltd
Priority to CN201810506954.8A priority Critical patent/CN108734355B/en
Publication of CN108734355A publication Critical patent/CN108734355A/en
Application granted granted Critical
Publication of CN108734355B publication Critical patent/CN108734355B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention relates to a short-term power load parallel prediction method and a system applied to a power quality comprehensive treatment scene, aiming at power load data characteristics, a K mean value clustering algorithm is adopted to divide the power load scene; aiming at unbalanced load scenes, a balanced KNN algorithm is provided to accurately classify the scene of the load to be measured; performing scene-by-scene training and prediction of a load prediction model on the massive historical data by adopting a bp neural network algorithm; the proposed algorithm model is improved in a parallelization mode by adopting an Apache spark programming framework of cloud computing, and the capability of processing massive high-dimensional data is improved.

Description

Short-term power load parallel prediction method and system applied to power quality comprehensive management scene
Technical Field
The invention relates to the technical field of power load prediction, in particular to a short-term power load parallel prediction method and system applied to a power quality comprehensive treatment scene.
Background
The Energy Internet (EI) is a new energy utilization system characterized by deeply combining new energy technology and information technology to solve the problems of gradual exhaustion of fossil fuels and environmental pollution caused by the gradual exhaustion of fossil fuels in the background of the third industrial revolution. Compared with a smart power grid, the dependence degree of the energy Internet on the modern Internet technology is more profound.
Load prediction is always a key link of operation regulation and control of an electric power system, and influences smooth implementation of various analysis and decision functions of the electric power system, such as economic dispatching, automatic power generation control, safety assessment, maintenance plans, electric power market operation and the like. In the time span, the load prediction can be roughly divided into medium-long term load prediction and short term load prediction, wherein the load change situation of the system from a few hours to a few days in the future is forecasted by the load prediction system, and the load prediction system mainly provides data support for the operation regulation and control of the power system and is the key point of the research of the invention.
From a methodology perspective, methods employed for short-term load prediction can be broadly divided into time series prediction, intelligent prediction, and combinatorial prediction. The time series method is characterized in that a prediction model is constructed by utilizing the relevance among continuous time series elements, and the future change trend of the load is predicted in an extrapolation mode, and the specific methods comprise a regression analysis method, a load derivation method, an exponential smoothing method, a Kalman filtering method and the like. The intelligent prediction method mainly comprises the steps of constructing a learning structure suitable for modeling of a highly nonlinear incidence relation, obtaining a nonlinear mapping relation of input and output by using supervised learning of input and output historical data, and predicting future load change conditions, wherein the specific methods comprise an expert system method, an artificial neural network learning method, a support vector machine learning method and the like. The combined prediction method focuses on the combination of two or more methods, and because different prediction objects are difficult to find a universally best prediction method, the results obtained by various prediction methods have certain credibility, and therefore, the combined prediction method improves the accuracy of load prediction by integrating various prediction models and methods. The method forms a solid theoretical basis for load prediction, but the application background of mass data faced by the current power grid is not considered in the prediction process, and further discussion on the applicability of the method to the load prediction problem in a large data environment is needed.
Cloud computing (cloud computing) is a method and process for providing resources such as computation, storage, data, etc. in the form of services to requesters by a group of servers located on a network to complete information processing tasks, and represents a large-scale distributed computing model based on the internet. The cloud computing integrates various wide-area heterogeneous computing resources by utilizing the Internet to form an abstract, virtual and dynamically expandable computing resource pool, and then provides services such as computing capacity, storage capacity, a software platform and application software to users as required through the Internet. The method comprises the following steps of establishing a cloud computing-based power system computing platform in a document 'new energy application of a cloud computing data center', and discussing the implementation of the power system cloud computing platform in detail in aspects of physical composition, system architecture, software technology and the like; the document "cloud computing: a core computing platform of a future power system is constructed, so that the generation source and the characteristics of big data in each link of power generation, power transmission and transformation and power utilization are analyzed, and the advantages of the existing big data processing technology in the aspects of intelligent power grid construction and big data processing are analyzed in detail; the document "the current situation and the challenge of the smart grid big data processing technology" also combines the cloud computing technology to provide a big data analysis processing platform at the power consumer side, and provides a parallel load prediction method based on a random forest algorithm, so that the load prediction time is shortened, and the processing capacity of the prediction algorithm on big data is improved; according to the literature, "power consumer side big data analysis and parallel load prediction" based on a local weighted linear regression and a cloud computing platform, a parallel local weighted linear regression model is established for load prediction, so that the time consumption of load prediction is reduced, and the prediction precision is improved; the document 'short-term prediction of power load under mass data' provides an online serialized short-term load prediction model of an extreme learning machine, and the capacity of processing high-dimensional data by an algorithm is improved by adopting MapReduce parallelization programming.
The above documents introduce the idea of parallel computing on the basis of the traditional algorithm, and the capability of the prediction algorithm for processing the large data of the power load is remarkably improved.
The change of the load has a close and inseparable relationship with factors such as climate, date type and the like, and similar external conditions are corresponding to a similar load change rule. In the literature, a clustering method is adopted for a distributed power load prediction algorithm based on cloud computing and an extreme learning machine to enable historical data with similar power load data characteristics to be in the same load scene, and the capability of prediction and marking of unknown data classes by algorithms such as a K-nearest neighbor method (KNN, K-nearest neighbor) is analyzed. However, in this method, every time a new set of load data is added to the load scenario training set, the model needs to be retrained using all newly grasped data, which consumes a lot of computing resources and training time.
In summary, in the prior art, an effective solution is still lacking for the problems of insufficient single-computer computing resources and large time consumption caused by massive and high-dimensional data of the energy internet.
Disclosure of Invention
In view of the above, the invention aims to provide a short-term power load parallel prediction method and system applied to a power quality comprehensive management scene, and a K-means clustering algorithm is adopted to divide a power load scene aiming at power load data characteristics; aiming at unbalanced load scenes, a balanced KNN algorithm is provided to accurately classify the scene of the load to be measured; performing scene-by-scene training and prediction of a load prediction model on the massive historical data by adopting a bp neural network algorithm; the proposed algorithm model is improved in a parallelization mode by adopting an Apache spark programming framework of cloud computing, and the capability of processing massive high-dimensional data is improved.
The invention is realized by adopting the following technical scheme: a short-term power load parallel prediction method applied to an electric energy quality comprehensive treatment scene comprises the following steps:
step S1: clustering historical data of the power load to obtain a clustered load scene;
step S2: carrying out balance analysis on the clustered load scenes by adopting a balance KNN algorithm to obtain balanced load scenes, and carrying out load scene decision;
step S3: the load scenario obtained in step S2 is trained, and load prediction is performed.
In step S1, a K-means clustering algorithm [ unnecessary technical features ] is used to cluster the power load historical data. In step S3, the load scenario obtained in step S2 is trained using a neural network.
Further, the step S1 specifically includes the following steps:
step S11: storing mass power load historical data on a distributed file system (HDFS) of a Hadoop platform in a distributed manner;
step S12: reading historical data of a power load in the HDFS, carrying out blocking processing on the historical data, and selecting K clustering centers;
step S13: sending the current clustering center data to all nodes, searching the clustering center closest to the data in the nodes, calculating the corresponding distance between the data in the nodes and the closest clustering center, and adding class labels to the training data;
step S14: calculating the sum of the category data of each clustering center and the mean value of each clustering center, and updating the clustering centers according to the mean values;
step S15: and judging whether the clustering result converges to a convergence condition, if not, returning to the step S13, otherwise, ending iteration and outputting the clustering result, wherein the clustering result is a load scene.
Further, step S2 specifically includes the following steps:
step S211: storing the test data and the load scene data generated in the step S1 on a distributed file system (HDFS) of a Hadoop platform in a distributed manner, forming a test data set and a load scene data set, and taking the load scene data set as a training data set;
step S212: converting the training data set and the testing data set into a load scene RDD data set and a testing RDD data set by using a Spark platform;
step S213: determining mark samples, threshold values, sample mark degrees and class mark degrees of various types in the load scene RDD data set;
step S214: calculating the Euclidean distance between each sample in the candidate class of the load scene RDD data set considering the sample mark degree and the test sample in the test RDD data set, and selecting h adjacent samples with the minimum Euclidean distance considering the sample mark degree;
step S215: and (4) calculating the weight of the class mark degree considered of the class to which the k adjacent samples selected in the step (S214) belong, and attributing the k adjacent samples to the class with the maximum class weight to obtain the balanced load scene data.
Further, step S213 is specifically:
calculating the Euclidean distance sum of each sample data and the rest sample data in various types of load scene RDD data sets, and selecting the sample with the minimum distance sum as the mark sample of the type;
calculating various thresholds in the RDD data set of the load scene: calculating the Euclidean distance sum of each sample and the mark sample in each class of the load scene RDD data set, and selecting the maximum value of the Euclidean distance sum of each sample and the mark sample in each class as the threshold value of the class;
respectively calculating Euclidean distances between class samples and mark samples and between mark samples and class samples of the load scene RDD data set, and calculating the ratio of the two, namely the sample mark degree;
and calculating Euclidean distance sums of each class mark sample and each sample of the load scene RDD data set, solving a difference value between the maximum value of the Euclidean distance sums and the Euclidean distance sums, and a difference value between the maximum value of the Euclidean distance sums and the minimum value of the Euclidean distance sums, and then calculating a ratio of the two difference values, namely the class mark degree.
Further, after the threshold value of each class is determined, the distance sum of the test sample and the sample in the class in the test RDD data set is calculated, and the class with the distance sum larger than the threshold value of the class is excluded.
Further, in step S3, the neural network adopts a parallelized BP neural network, and the parallelized adjustment method for the BP neural network includes the following steps:
step S331: setting initial parameters including the number of input layer nodes, the number of output layer nodes, the number of hidden layers, the number of hidden layer nodes and an error range;
step S332: preprocessing the balanced load scene data and normalizing the preprocessed balanced load scene data;
step S333: starting Map tasks at slave nodes, and obtaining a training set by each Mapper end; correcting the weight of the neural network according to the obtained training set, and finally sending the corrected weight to a Reduce end;
step S334: starting a reduce task at a Master node, and calculating a corrected average value sent by each Slave as output;
step S335: updating the configuration file;
step S336: and judging whether the difference value between the forward propagation processing value and the expected value of the neural network reaches the preset precision or whether the learning frequency is greater than the set maximum frequency, if so, obtaining the parallelization adjusted BP neural network, and otherwise, returning to the step S333.
Further, the step S6 specifically includes the following steps:
step S31: training the parallelized adjusted BP neural network model by utilizing a load scene data set based on a Spark platform to obtain a prediction model;
step S32: the test data set is predicted by a prediction model, and the prediction effect of the model is evaluated by using the average absolute percentage error and the acceleration ratio.
The invention also provides a system based on the short-term power load parallel prediction method based on the balanced KNN algorithm, which comprises a storage device and a processing device, wherein the storage device is used for storing the instructions of the method, and the processing device loads and executes the instructions stored in the storage device.
Compared with the prior art, the invention has the following beneficial effects: aiming at the characteristics of the power load data, a K-means clustering algorithm is adopted to divide the power load scene; aiming at unbalanced load scenes, a balanced KNN algorithm is provided to accurately classify the scene of the load to be measured; performing scene-by-scene training and prediction of a load prediction model on the massive historical data by adopting a bp neural network algorithm; the proposed algorithm model is improved in a parallelization mode by adopting an Apache spark programming framework of cloud computing, and the capability of processing massive high-dimensional data is improved.
Drawings
FIG. 1 is a flow chart of a short-term power load parallel prediction method based on a balanced KNN algorithm according to an embodiment of the present invention;
FIG. 2 is a K-means algorithm clustering flow chart according to an embodiment of the present invention;
FIG. 3 is a flow chart of an embodiment of the balanced KNN algorithm of the present invention;
FIG. 4 is a diagram of a balanced KNN classification algorithm calculation structure according to an embodiment of the present invention;
FIG. 5 is a flowchart of BP neural network algorithm clustering according to an embodiment of the present invention;
FIG. 6 is a graph of historical load data for an embodiment of the present invention;
FIG. 7 is a clustered load scenario according to an embodiment of the present invention;
FIG. 8 is a flowchart of a load scenario decision in accordance with an embodiment of the present invention;
FIG. 9 is a graph comparing time required for a conventional algorithm and a parallel algorithm according to an embodiment of the present invention;
FIG. 10 is a comparison of predicted results for embodiments of the present invention;
FIG. 11 is a graph of the root mean square error of the prediction results according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As introduced in the background art, in order to improve the power load prediction accuracy and solve the problem of insufficient single-machine computing resources caused by massive and high-dimensional data of an energy internet, the embodiment provides a balanced KNN-based short-term power load parallel prediction method applied to an electric energy quality comprehensive treatment scene, and a K-means clustering algorithm is adopted to divide the power load scene aiming at the characteristics of power load data; aiming at unbalanced load scenes, a balanced KNN algorithm is provided to accurately classify the scene of the load to be measured; carrying out scene-by-scene training and prediction of a load prediction model on the massive historical data by adopting a BP neural network algorithm; the proposed algorithm model is improved in a parallelization mode by adopting an Apache Spark calculation framework of cloud calculation, the capability of processing massive high-dimensional data is improved, the problem that model training consumes a lot of time under the massive data is solved, and the model training and predicting time for load prediction based on the massive data is remarkably shortened.
As shown in fig. 1, the embodiment provides a short-term power load parallel prediction method applied to a power quality comprehensive treatment scenario, and the method includes the following steps:
step S1: clustering the historical data of the power load by using a K-means clustering algorithm to obtain a clustered load scene;
step S2: carrying out balance analysis on the clustered load scenes by adopting a balance KNN algorithm to obtain balanced load scenes, and carrying out load scene decision;
step S3: the load scenario obtained in step S2 is subjected to neural network training, and load prediction is performed.
In this embodiment, the main reason for performing scene analysis on the historical load curve in load prediction is that the trend and trend of the load curve have an inseparable relationship with factors such as climate and date type, and if the user load under massive data is not studied in a targeted manner, a relatively high resource loss is generated on the power consumption scale of each user. Therefore, reasonable data mining technology is adopted, and load types close to the electricity utilization law and similar to the electricity utilization law are divided into the same scene. As shown in fig. 2, the load scene is divided by using a K-means clustering method, and the time scale of the algorithm is increased by using an Apache Spark calculation framework, which specifically comprises the following steps:
step S11: storing mass power load historical data on a distributed file system (HDFS) of a Hadoop platform in a distributed manner;
step S12: reading power load historical data in the HDFS, carrying out blocking processing on the power load historical data by 64M, and selecting K clustering centers, wherein K is the number of clusters;
step S13: sending the current clustering center data to all nodes, searching the clustering center closest to the data in the nodes, calculating the corresponding distance between the data in the nodes and the closest clustering center, and adding class labels to the training data; the calculation formula of the distance c between the data in the node and the nearest cluster center is as follows:
c=argmin||xi-uj||;
in the formula, xiClustering center data of the ith class; u. ofjIs the jth training data in class i.
Step S14: calculating the sum of the category data of each clustering center and the mean value of each clustering center, and updating the clustering centers according to the mean values; the calculation formula of the sum of the category data of each cluster center and the mean value of each cluster center is as follows:
Figure BDA0001671537150000081
wherein u isiThe mean value from the class training data to the clustering center; x is the number ofiThe ith training data; sjThe number of the j type data is;
step S15: and judging whether the clustering result converges to a convergence condition (the distance difference is less than 0.1 or the iteration exceeds 10 times after the two iterations), if not, returning to the step S13, otherwise, ending the iteration, and outputting the clustering result, wherein the clustering result is the load scene.
In this embodiment, the final load scene decision is not accurate due to the unbalanced load scene data generated in the first step. The invention adopts a balanced KNN algorithm to carry out balanced analysis on the load scene and carries out decision making on the load scene.
First, the present embodiment describes a conventional KNN algorithm module.
KNN is a non-parametric classification algorithm that performs classification of samples by computationally finding the class nearest to the test sample. The classification step of KNN can be roughly expressed as: given a sample data set T in advance, if there are k nearest representative samples to be tested, and most of the k representative samples belong to a predetermined type, then the sample to be tested will also be classified in this class.
The KNN algorithm generally adopts Euclidean distance to characterize the distance d between a test sample and a known class sample:
Figure BDA0001671537150000091
in the formula, TiRepresenting a training sample; djRepresenting the test sample, n representing the number of features of the sample, ωitAnd ωjtRespectively representing the values of the t-th feature in the sample vector sum.
The classification idea and process of the traditional KNN algorithm are simple, parameters do not need to be estimated, but the calculation amount of the algorithm in the classification process is remarkable in the background of the mass data set; the performance of the classification of the traditional KNN algorithm is easily influenced by training samples, and when the sample data set is unbalanced, the performance of the classification is extremely easily influenced and even becomes extremely poor.
The embodiment adopts an improved balanced KNN algorithm model, which is specifically as follows:
the balanced KNN algorithm greatly improves the accuracy of processing unbalanced data by the original KNN algorithm. The specific idea of the balanced KNN algorithm is shown in fig. 3, and the specific steps of the improved balanced KNN algorithm are as follows:
(1) and inputting a training data set and a characteristic data set to be detected, and carrying out standardized processing on sample data.
Figure BDA0001671537150000101
Wherein X is a set of power load characteristics;
Figure BDA0001671537150000102
a normalized power load feature set; min (X) is the minimum value in the power load; max (x) is the maximum value of the electrical load;
(2) a marker sample is determined. And calculating the Euclidean distance sum of each sample data and the other sample data in the class, selecting the value with the minimum distance sum, and taking the sample with the minimum distance sum as the mark sample of the class.
Figure BDA0001671537150000103
In the formula, xt、xiThe t and i samples in the class, respectively.
(3) And calculating the thresholds of various types. Calculating the Euclidean distance sum of each sample and the mark sample in each class, and taking the maximum value as the threshold value Iflag of the classi
Figure BDA0001671537150000104
In the formula, MiIs a standard sample of class i, xtIs as followsT-th sample of class i, IflagiIs a threshold value of the ith class.
After the threshold value of each class is determined, the distance sum of the test sample and the samples in the class is calculated, and the class with the distance sum larger than the threshold value is excluded, so that the accuracy and the speed of classification are improved.
(4) And calculating the sample marking degree and the class marking degree. Respectively calculating Euclidean distance between the class sample and the mark sample and distance between the mark sample and the class sample, and calculating the ratio of the two, namely the sample mark degree Rci
Figure BDA0001671537150000105
In the formula, MiFor class i marker samples, xt、xiThe ith sample and the ith sample in the ith class are respectively; n isiThe number of samples in the ith class.
Calculating Euclidean distance sum of each class mark sample and each sample, solving the difference value between the Euclidean distance sum maximum value and the Euclidean distance sum of each class mark sample and each sample, and the difference value between the maximum value of the Euclidean distance sum and the minimum value of the Euclidean distance sum, and then calculating the ratio of the two difference values, namely the class mark degree Pi
Figure BDA0001671537150000111
In the formula, MiFor class i marker samples, MjFor class j flag samples, ni、njNumber of samples, x, of class i and j, respectivelyt、xiThe t and i samples in the i class, respectively.
(5) The distance is calculated taking into account the sample signaturity. Calculating a test sample xjWith each sample x in the candidate classiEuclidean distance of d (x)i,xj) And finding h neighbor samples with the smallest distance.
Figure BDA0001671537150000112
In the formula, RciSample mark degree, omega, of ith sample of the c-th classitAnd ωjtRespectively representing sample vectors xiAnd xjThe t-th eigenvalue of (1).
(6) Calculating the weight of the considered class mark degree of the class to which the selected h adjacent samples belong, WjtBelonging to class t ctThe weight of (d) is calculated as:
Figure BDA0001671537150000113
in the formula, PiClass mark degree, v (x) for class ii,xj) For the weight function of the vote, the invention takes d (x)i,xj),
Figure BDA0001671537150000114
(7) X is to beiIs classified as the class with the largest class weight.
As shown in fig. 4, step S2 of this embodiment specifically includes the following steps:
step S211: storing the test data and the load scene data generated in the step S1 on a distributed file system (HDFS) of a Hadoop platform in a distributed manner, forming a test data set and a load scene data set, and taking the load scene data set as a training data set;
step S212: converting the training data set and the testing data set into a load scene RDD data set and a testing RDD data set by using a Spark platform;
step S213: determining mark samples, threshold values, sample mark degrees and class mark degrees of various types in the load scene RDD data set;
step S214: calculating the Euclidean distance between each sample in the candidate class of the load scene RDD data set considering the sample mark degree and the test sample in the test RDD data set, and selecting h adjacent samples with the minimum Euclidean distance considering the sample mark degree;
step S215: and (4) calculating the weight of the class mark degree considered of the class to which the k adjacent samples selected in the step (S214) belong, and attributing the k adjacent samples to the class with the maximum class weight to obtain the balanced load scene data.
In this embodiment, the step S213 specifically includes:
calculating the Euclidean distance sum of each sample data and the rest sample data in various types of load scene RDD data sets, and selecting the sample with the minimum distance sum as the mark sample of the type;
calculating various thresholds in the RDD data set of the load scene: calculating the Euclidean distance sum of each sample and the mark sample in each class of the load scene RDD data set, and selecting the maximum value of the Euclidean distance sum of each sample and the mark sample in each class as the threshold value of the class;
respectively calculating Euclidean distances between class samples and mark samples and between mark samples and class samples of the load scene RDD data set, and calculating the ratio of the two, namely the sample mark degree;
and calculating Euclidean distance sums of each class mark sample and each sample of the load scene RDD data set, solving a difference value between the maximum value of the Euclidean distance sums and the Euclidean distance sums, and a difference value between the maximum value of the Euclidean distance sums and the minimum value of the Euclidean distance sums, and then calculating a ratio of the two difference values, namely the class mark degree.
In this embodiment, after the threshold value of each class is determined, the sum of the distances between the test sample and the samples in the class in the test RDD dataset is calculated, and the class with the sum of the distances being greater than the threshold value of the class is excluded.
Preferably, in the present embodiment, the load scenario of the test data has already been determined in step S2, and step S3 is to perform training and result prediction of the load scenario by using a neural network on the basis of the above results.
The BP neural network is a multi-layer feedforward network trained according to an error back propagation algorithm, and is one of the most widely applied neural network models at present. The device consists of an output layer, an input layer and a plurality of hidden layers. The BP learning algorithm of the method consists of two processes of forward propagation of network information flow and backward propagation of errors and updating of neuron connection weights.
In order to reduce the operation time of the algorithm by using a parallel operation method, the BP learning algorithm is subjected to MapReduce decomposition, the main decomposition method is to calculate the local gradient change quantity of the weight generated by back propagation for each connection weight of the network, after all samples are processed, the weight is subjected to batch processing and updating once by using the average value of the change quantity,
in this embodiment, as shown in fig. 5, the parallelization adjustment method for the BP neural network includes the following steps:
step S331: setting initial parameters including the number of input layer nodes, the number of output layer nodes, the number of hidden layers, the number of hidden layer nodes and an error range;
step S332: preprocessing the balanced load scene data and normalizing the preprocessed balanced load scene data;
step S333: starting Map tasks at slave nodes, and obtaining a training set by each Mapper end; correcting the weight of the neural network according to the obtained training set, and finally sending the corrected weight to a Reduce end;
step S334: starting a reduce task at a Master node, and calculating a corrected average value sent by each Slave as output;
step S335: updating the configuration file;
step S336: and judging whether the difference value between the forward propagation processing value and the expected value of the neural network reaches the preset precision or whether the learning frequency is greater than the set maximum frequency, if so, obtaining the parallelization adjusted BP neural network, and otherwise, returning to the step S333.
In this embodiment, the step S6 specifically includes the following steps:
step S31: training the parallelized adjusted BP neural network model by utilizing a load scene data set based on a Spark platform to obtain a prediction model;
step S32: the test data set is predicted by a prediction model, and the prediction effect of the model is evaluated by using the average absolute percentage error and the acceleration ratio.
The embodiment also provides a system based on the short-term electric load parallel prediction method based on the balanced KNN algorithm, which includes a storage device and a processing device, wherein the storage device is used for storing the instructions of the method, and the processing device loads and executes the instructions stored in the storage device.
In particular, since the power load prediction of the present embodiment is a prediction of a future power load from historical data, a difference exists between a predicted value and an actual value, and a power load prediction error is generated. The prediction algorithm of the present embodiment generates many prediction errors, and the summary mainly includes: 1) the historical data of the power load is not complete enough; 2) the power load scene division identification degree is not enough; 3) the initial parameters of the training model are not well chosen. The evaluation indexes adopted in the embodiment are as follows:
let y (i) and
Figure BDA0001671537150000141
respectively representing the actual load value and the predicted load value at the time i, the following are provided:
absolute pair error:
Figure BDA0001671537150000142
relative error:
Figure BDA0001671537150000143
wherein e is1Mean error per day. Because the prediction error has positive and negative values, the absolute value of the error is taken when the average value is calculated in order to avoid the cancellation of the positive and negative values.
Figure BDA0001671537150000144
Wherein e is2Is the root mean square error. The root mean square error index strengthens the action of errors with large numerical values and improves the sensitivity of the index.
The data source of the embodiment is load data collected by a smart meter of a certain power grid, and historical load data of the load data is shown in fig. 6. In fig. 6, the load data includes characteristics of temperature, humidity, precipitation, wind speed, day type, season, etc., and the load data at the time of the whole point is used for training and prediction.
The four subgraphs in fig. 7 correspond to each type of load curve, so that it can be seen that the K-means clustering algorithm has excellent clustering capability and can well distinguish load curves with different characteristics, i.e., load scenes.
Imbalance analysis is performed on the clustered data, load scene decision is performed after the balanced data is obtained, and a load value is determined, and the process is shown in fig. 8.
From the results shown in fig. 9, it can be seen that the predicted time difference between the two is not large when the data samples are small, and on the contrary, the time required by the single-machine algorithm is slightly better than that of the parallel algorithm because: the parallel algorithm divides the data into a plurality of sub-sample sets under a small sample set, and the communication cost among different data subsets is increased to influence the prediction speed; however, as the sample set increases, the iteration time required by the prediction algorithm is significantly different, and the time required by the parallel algorithm is far less than that of the single-machine method.
The comparison result of the load prediction value and the actual load value obtained by the load prediction algorithm based on cloud computing in this embodiment is shown in table 1.
TABLE 1 comparative graph of load prediction results
Figure BDA0001671537150000151
As shown in fig. 10 and 11, the predicted value curve and the actual value curve have similar trends, and the mean root mean square error is 2.13%. The error of the prediction result meets the error standard of the load prediction. A cloud computing based load prediction algorithm proved to be feasible.
In the embodiment, the performance of the system is strictly tested by taking certain power grid data as an example, and the test comprises the test of the large data processing performance and the power load prediction accuracy of the system. The result shows that the method improves the serious defect of the traditional KNN algorithm, and improves the accuracy of the load scene. After data preprocessing is carried out through the K-means model, Spark with a parallel programming model and a calculation framework is combined with the load prediction algorithm, a balanced KNN algorithm and a parallelized neural network programming model are provided, the problem of the calculated amount of mass data is solved, the time consumed by prediction is greatly shortened, and meanwhile, the prediction precision is guaranteed to meet the load prediction requirement.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims (9)

1. A short-term power load parallel prediction method applied to a power quality comprehensive treatment scene is characterized by comprising the following steps: the method comprises the following steps:
step S1: clustering historical data of the power load to obtain a clustered load scene;
step S2: carrying out balance analysis on the clustered load scenes by adopting a balance KNN algorithm, and carrying out load scene decision to obtain a balanced load scene;
step S3: training the load scene obtained in the step S2, and predicting the load;
step S2 specifically includes the following steps:
step S211: storing the test data and the load scene data generated in the step S1 on a distributed file system (HDFS) of a Hadoop platform in a distributed manner, forming a test data set and a load scene data set, and taking the load scene data set as a training data set;
step S212: converting the training data set and the testing data set into a load scene RDD data set and a testing RDD data set by using a Spark platform;
step S213: determining mark samples, threshold values, sample mark degrees and class mark degrees of various types in the load scene RDD data set;
step S214: calculating the Euclidean distance between each sample in the candidate class of the load scene RDD data set considering the sample mark degree and the test sample in the test RDD data set, and selecting h adjacent samples with the minimum Euclidean distance considering the sample mark degree;
step S215: and (4) calculating the weight of the class mark degree considered of the class to which the k adjacent samples selected in the step (S214) belong, and attributing the k adjacent samples to the class with the maximum class weight to obtain the balanced load scene data.
2. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 1, wherein the method comprises the following steps: in step S1, a K-means clustering algorithm is used to cluster the power load historical data.
3. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 1, wherein the method comprises the following steps: in step S3, the load scenario obtained in step S2 is trained using a neural network.
4. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 2, wherein the method comprises the following steps: the step S1 specifically includes the following steps:
step S11: storing mass power load historical data on a distributed file system (HDFS) of a Hadoop platform in a distributed manner;
step S12: reading historical data of a power load in the HDFS, carrying out blocking processing on the historical data, and selecting K clustering centers;
step S13: sending the current clustering center data to all nodes, searching the clustering center closest to the data in the nodes, calculating the corresponding distance between the data in the nodes and the closest clustering center, and adding class labels to the training data;
step S14: calculating the sum of the category data of each clustering center and the mean value of each clustering center, and updating the clustering centers according to the mean values;
step S15: and judging whether the clustering result converges to a convergence condition, if not, returning to the step S13, otherwise, ending iteration and outputting the clustering result, wherein the clustering result is a load scene.
5. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 1, wherein the method comprises the following steps: the step S213 specifically includes:
calculating the Euclidean distance sum of each sample data and the rest sample data in various types of load scene RDD data sets, and selecting the sample with the minimum distance sum as the mark sample of the type;
calculating the Euclidean distance sum of each sample and the mark sample in each class of the load scene RDD data set, and selecting the maximum value of the Euclidean distance sum of each sample and the mark sample in each class as the threshold value of the class;
respectively calculating Euclidean distances between class samples and mark samples and between mark samples and class samples of the load scene RDD data set, and calculating the ratio of the two, namely the sample mark degree;
and calculating Euclidean distance sums of each class mark sample and each sample of the load scene RDD data set, solving a difference value between the maximum value of the Euclidean distance sums and the Euclidean distance sums, and a difference value between the maximum value of the Euclidean distance sums and the minimum value of the Euclidean distance sums, and then calculating a ratio of the two difference values, namely the class mark degree.
6. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 5, wherein the method comprises the following steps: after the threshold value of each class is determined, the distance sum of the test sample and the sample in the class in the test RDD data set is calculated, and the class with the distance sum larger than the threshold value of the class is excluded.
7. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 3, wherein the method comprises the following steps: in step S3, the neural network adopts a BP neural network subjected to parallelization adjustment, wherein the parallelization adjustment method for the BP neural network includes the following steps:
step S331: setting initial parameters including the number of input layer nodes, the number of output layer nodes, the number of hidden layers, the number of hidden layer nodes and an error range;
step S332: preprocessing the balanced load scene data and normalizing the preprocessed balanced load scene data;
step S333: starting Map tasks at slave nodes, and obtaining a training set by each Mapper end; correcting the weight of the neural network according to the obtained training set, and finally sending the corrected weight to a Reduce end;
step S334: starting a reduce task at a Master node, and calculating a corrected average value sent by each Slave as output;
step S335: updating the configuration file;
step S336: and judging whether the difference value between the forward propagation processing value and the expected value of the neural network reaches the preset precision or whether the learning frequency is greater than the set maximum frequency, if so, obtaining the parallelization adjusted BP neural network, and otherwise, returning to the step S333.
8. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 3, wherein the method comprises the following steps: the step S3 specifically includes the following steps:
step S31: training the parallelized adjusted BP neural network model by utilizing a load scene data set based on a Spark platform to obtain a prediction model;
step S32: the test data set is predicted by a prediction model, and the prediction effect of the model is evaluated by using the average absolute percentage error and the acceleration ratio.
9. A system for a short-term power load parallel prediction method applied to an electric energy quality comprehensive treatment scene is based on any one of claims 1 to 8, and is characterized in that: comprising a storage device to store instructions of the method of any one of claims 1 to 8, a processing device to load and execute the instructions stored in the storage device.
CN201810506954.8A 2018-05-24 2018-05-24 Short-term power load parallel prediction method and system applied to power quality comprehensive management scene Active CN108734355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810506954.8A CN108734355B (en) 2018-05-24 2018-05-24 Short-term power load parallel prediction method and system applied to power quality comprehensive management scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810506954.8A CN108734355B (en) 2018-05-24 2018-05-24 Short-term power load parallel prediction method and system applied to power quality comprehensive management scene

Publications (2)

Publication Number Publication Date
CN108734355A CN108734355A (en) 2018-11-02
CN108734355B true CN108734355B (en) 2022-03-08

Family

ID=63936354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810506954.8A Active CN108734355B (en) 2018-05-24 2018-05-24 Short-term power load parallel prediction method and system applied to power quality comprehensive management scene

Country Status (1)

Country Link
CN (1) CN108734355B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766950B (en) * 2019-01-18 2023-07-14 东北大学 Industrial user short-term load prediction method based on morphological clustering and LightGBM
CN109919422A (en) * 2019-01-23 2019-06-21 浙江工业大学 A kind of Comprehensive assessment of power quality method considering data dynamic fuzzy dependence
CN109945420B (en) * 2019-03-26 2020-11-17 南京南瑞继保电气有限公司 Air conditioner control method and device based on load prediction and computer storage medium
CN110032624B (en) * 2019-03-26 2023-01-20 创新先进技术有限公司 Sample screening method and device
CN112288120A (en) * 2019-07-26 2021-01-29 上海漕泾热电有限责任公司 Production scheduling real-time optimization system and method for cogeneration unit
CN111178417A (en) * 2019-12-23 2020-05-19 云南恒协科技有限公司 Energy accurate load prediction method for individual and group of users
CN111340291B (en) * 2020-02-26 2022-09-02 南京邮电大学 Medium-and-long-term power load combined prediction system and method based on cloud computing technology
CN111476419A (en) * 2020-04-08 2020-07-31 长园深瑞继保自动化有限公司 Planned value prediction method of energy storage system and energy storage coordination control device
CN113687292B (en) * 2020-05-18 2024-02-02 宁夏隆基宁光仪表股份有限公司 System and method for detecting incorrect wiring of electric energy meter under big data and cloud environment
CN111768034A (en) * 2020-06-29 2020-10-13 上海积成能源科技有限公司 Method for interpolating and supplementing missing value based on neighbor algorithm in power load prediction
CN111680764B (en) * 2020-08-13 2020-10-27 国网浙江省电力有限公司 Industry reworking and production-resuming degree monitoring method
CN111680939B (en) * 2020-08-13 2020-11-06 国网浙江省电力有限公司 Enterprise re-work and re-production degree monitoring method based on artificial intelligence
CN111680852B (en) * 2020-08-13 2020-11-06 国网浙江省电力有限公司 Method and system for monitoring overall energy consumption of area
CN114688692B (en) * 2020-12-30 2023-10-20 北京天诚同创电气有限公司 Load prediction method, system and device
CN112632154B (en) * 2020-12-30 2024-03-12 城云科技(中国)有限公司 Method and device for determining parallel service quantity and time interval based on time data
CN113283774A (en) * 2021-06-07 2021-08-20 润电能源科学技术有限公司 Deep peak regulation method and device for heating unit, electronic equipment and storage medium
CN113657687B (en) * 2021-08-30 2023-09-29 国家电网有限公司 Power load prediction method based on feature engineering and multipath deep learning
CN116527620A (en) * 2023-06-25 2023-08-01 上海帜讯信息技术股份有限公司 Machine learning transmission method, device and storage medium based on multiple message bodies
CN117150282B (en) * 2023-09-16 2024-01-30 石家庄正和网络有限公司 Secondhand equipment recycling evaluation method and system based on prediction model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807961A (en) * 2017-10-10 2018-03-16 国网浙江省电力公司丽水供电公司 A kind of power distribution network big data multidomain treat-ment method based on Spark computing engines
CN107832876A (en) * 2017-10-27 2018-03-23 国网江苏省电力公司南通供电公司 Subregion peak load Forecasting Methodology based on MapReduce frameworks

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7332093B2 (en) * 2004-11-29 2008-02-19 Kruger Off-Shore A/S Method for water purification
CN103729695A (en) * 2014-01-06 2014-04-16 国家电网公司 Short-term power load forecasting method based on particle swarm and BP neural network
CN105205563B (en) * 2015-09-28 2017-02-08 国网山东省电力公司菏泽供电公司 Short-term load predication platform based on large data
CN105590174A (en) * 2015-12-29 2016-05-18 南京因泰莱电器股份有限公司 Enterprise power consumption load prediction method based on K-means clustering RBF neural network
CN106022521B (en) * 2016-05-19 2020-05-19 四川大学 Short-term load prediction method of distributed BP neural network based on Hadoop architecture
CN106446967A (en) * 2016-10-28 2017-02-22 国网福建省电力有限公司 Novel power system load curve clustering method
CN106600059B (en) * 2016-12-13 2020-07-24 北京邮电大学 Intelligent power grid short-term load prediction method based on improved RBF neural network
CN107169589A (en) * 2017-04-12 2017-09-15 佛山电力设计院有限公司 A kind of low pressure grid based on KNN and roulette algorithm becomes more meticulous load forecasting method
CN107423839A (en) * 2017-04-17 2017-12-01 湘潭大学 A kind of method of the intelligent building microgrid load prediction based on deep learning
CN107609667B (en) * 2017-07-20 2020-08-18 国网山东省电力公司电力科学研究院 Heat supply load prediction method and system based on Box _ cox transformation and UFCNN
CN107578124A (en) * 2017-08-28 2018-01-12 国网山东省电力公司电力科学研究院 The Short-Term Load Forecasting Method of GRU neutral nets is improved based on multilayer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807961A (en) * 2017-10-10 2018-03-16 国网浙江省电力公司丽水供电公司 A kind of power distribution network big data multidomain treat-ment method based on Spark computing engines
CN107832876A (en) * 2017-10-27 2018-03-23 国网江苏省电力公司南通供电公司 Subregion peak load Forecasting Methodology based on MapReduce frameworks

Also Published As

Publication number Publication date
CN108734355A (en) 2018-11-02

Similar Documents

Publication Publication Date Title
CN108734355B (en) Short-term power load parallel prediction method and system applied to power quality comprehensive management scene
CN109063911B (en) Load aggregation grouping prediction method based on gated cycle unit network
CN111028100A (en) Refined short-term load prediction method, device and medium considering meteorological factors
CN110533112A (en) Internet of vehicles big data cross-domain analysis and fusion method
CN112418482A (en) Cloud computing energy consumption prediction method based on time series clustering
CN109657884A (en) Power grid power supply optimization method, apparatus, equipment and computer readable storage medium
CN115714420A (en) Combined power station operation optimization method and system based on high-precision wind and light output prediction
CN115186803A (en) Data center computing power load demand combination prediction method and system considering PUE
CN115775045A (en) Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment
CN113361785A (en) Power distribution network short-term load prediction method and device, terminal and storage medium
CN109686402A (en) Based on key protein matter recognition methods in dynamic weighting interactive network
Wang et al. Short-term load forecasting with LSTM based ensemble learning
CN115099511A (en) Photovoltaic power probability estimation method and system based on optimized copula
CN116148753A (en) Intelligent electric energy meter operation error monitoring system
CN116842459B (en) Electric energy metering fault diagnosis method and diagnosis terminal based on small sample learning
Lv et al. Short-term power load forecasting based on balanced KNN
CN117272850A (en) Elastic space analysis method for safe operation scheduling of power distribution network
CN113033898A (en) Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network
CN116415732A (en) User side power load data processing method based on improved ARNN
Zhai et al. Combining PSO-SVR and Random Forest Based Feature Selection for Day-ahead Peak Load Forecasting.
CN116826710A (en) Peak clipping strategy recommendation method and device based on load prediction and storage medium
CN117034762A (en) Composite model lithium battery life prediction method based on multi-algorithm weighted sum
CN115204698A (en) Real-time analysis method for power supply stability of low-voltage transformer area
CN115118015A (en) Platform district power supply stability monitoring system based on fuse terminal
CN113962440A (en) DPC and GRU fused photovoltaic prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant