CN108734355B

CN108734355B - Short-term power load parallel prediction method and system applied to power quality comprehensive management scene

Info

Publication number: CN108734355B
Application number: CN201810506954.8A
Authority: CN
Inventors: 郭敬东; 张健; 黄道姗; 张慧瑜; 林芳; 林焱; 张伟骏; 陈伯建; 项胤兴; 黄霆; 徐振华; 吴丹岳
Original assignee: Electric Power Research Institute of State Grid Fujian Electric Power Co Ltd; State Grid Fujian Electric Power Co Ltd
Current assignee: Electric Power Research Institute of State Grid Fujian Electric Power Co Ltd; State Grid Fujian Electric Power Co Ltd
Priority date: 2018-05-24
Filing date: 2018-05-24
Publication date: 2022-03-08
Anticipated expiration: 2038-05-24
Also published as: CN108734355A

Abstract

The invention relates to a short-term power load parallel prediction method and a system applied to a power quality comprehensive treatment scene, aiming at power load data characteristics, a K mean value clustering algorithm is adopted to divide the power load scene; aiming at unbalanced load scenes, a balanced KNN algorithm is provided to accurately classify the scene of the load to be measured; performing scene-by-scene training and prediction of a load prediction model on the massive historical data by adopting a bp neural network algorithm; the proposed algorithm model is improved in a parallelization mode by adopting an Apache spark programming framework of cloud computing, and the capability of processing massive high-dimensional data is improved.

Description

Short-term power load parallel prediction method and system applied to power quality comprehensive management scene

Technical Field

The invention relates to the technical field of power load prediction, in particular to a short-term power load parallel prediction method and system applied to a power quality comprehensive treatment scene.

Background

The Energy Internet (EI) is a new energy utilization system characterized by deeply combining new energy technology and information technology to solve the problems of gradual exhaustion of fossil fuels and environmental pollution caused by the gradual exhaustion of fossil fuels in the background of the third industrial revolution. Compared with a smart power grid, the dependence degree of the energy Internet on the modern Internet technology is more profound.

Load prediction is always a key link of operation regulation and control of an electric power system, and influences smooth implementation of various analysis and decision functions of the electric power system, such as economic dispatching, automatic power generation control, safety assessment, maintenance plans, electric power market operation and the like. In the time span, the load prediction can be roughly divided into medium-long term load prediction and short term load prediction, wherein the load change situation of the system from a few hours to a few days in the future is forecasted by the load prediction system, and the load prediction system mainly provides data support for the operation regulation and control of the power system and is the key point of the research of the invention.

From a methodology perspective, methods employed for short-term load prediction can be broadly divided into time series prediction, intelligent prediction, and combinatorial prediction. The time series method is characterized in that a prediction model is constructed by utilizing the relevance among continuous time series elements, and the future change trend of the load is predicted in an extrapolation mode, and the specific methods comprise a regression analysis method, a load derivation method, an exponential smoothing method, a Kalman filtering method and the like. The intelligent prediction method mainly comprises the steps of constructing a learning structure suitable for modeling of a highly nonlinear incidence relation, obtaining a nonlinear mapping relation of input and output by using supervised learning of input and output historical data, and predicting future load change conditions, wherein the specific methods comprise an expert system method, an artificial neural network learning method, a support vector machine learning method and the like. The combined prediction method focuses on the combination of two or more methods, and because different prediction objects are difficult to find a universally best prediction method, the results obtained by various prediction methods have certain credibility, and therefore, the combined prediction method improves the accuracy of load prediction by integrating various prediction models and methods. The method forms a solid theoretical basis for load prediction, but the application background of mass data faced by the current power grid is not considered in the prediction process, and further discussion on the applicability of the method to the load prediction problem in a large data environment is needed.

Cloud computing (cloud computing) is a method and process for providing resources such as computation, storage, data, etc. in the form of services to requesters by a group of servers located on a network to complete information processing tasks, and represents a large-scale distributed computing model based on the internet. The cloud computing integrates various wide-area heterogeneous computing resources by utilizing the Internet to form an abstract, virtual and dynamically expandable computing resource pool, and then provides services such as computing capacity, storage capacity, a software platform and application software to users as required through the Internet. The method comprises the following steps of establishing a cloud computing-based power system computing platform in a document 'new energy application of a cloud computing data center', and discussing the implementation of the power system cloud computing platform in detail in aspects of physical composition, system architecture, software technology and the like; the document "cloud computing: a core computing platform of a future power system is constructed, so that the generation source and the characteristics of big data in each link of power generation, power transmission and transformation and power utilization are analyzed, and the advantages of the existing big data processing technology in the aspects of intelligent power grid construction and big data processing are analyzed in detail; the document "the current situation and the challenge of the smart grid big data processing technology" also combines the cloud computing technology to provide a big data analysis processing platform at the power consumer side, and provides a parallel load prediction method based on a random forest algorithm, so that the load prediction time is shortened, and the processing capacity of the prediction algorithm on big data is improved; according to the literature, "power consumer side big data analysis and parallel load prediction" based on a local weighted linear regression and a cloud computing platform, a parallel local weighted linear regression model is established for load prediction, so that the time consumption of load prediction is reduced, and the prediction precision is improved; the document 'short-term prediction of power load under mass data' provides an online serialized short-term load prediction model of an extreme learning machine, and the capacity of processing high-dimensional data by an algorithm is improved by adopting MapReduce parallelization programming.

The above documents introduce the idea of parallel computing on the basis of the traditional algorithm, and the capability of the prediction algorithm for processing the large data of the power load is remarkably improved.

The change of the load has a close and inseparable relationship with factors such as climate, date type and the like, and similar external conditions are corresponding to a similar load change rule. In the literature, a clustering method is adopted for a distributed power load prediction algorithm based on cloud computing and an extreme learning machine to enable historical data with similar power load data characteristics to be in the same load scene, and the capability of prediction and marking of unknown data classes by algorithms such as a K-nearest neighbor method (KNN, K-nearest neighbor) is analyzed. However, in this method, every time a new set of load data is added to the load scenario training set, the model needs to be retrained using all newly grasped data, which consumes a lot of computing resources and training time.

In summary, in the prior art, an effective solution is still lacking for the problems of insufficient single-computer computing resources and large time consumption caused by massive and high-dimensional data of the energy internet.

Disclosure of Invention

In view of the above, the invention aims to provide a short-term power load parallel prediction method and system applied to a power quality comprehensive management scene, and a K-means clustering algorithm is adopted to divide a power load scene aiming at power load data characteristics; aiming at unbalanced load scenes, a balanced KNN algorithm is provided to accurately classify the scene of the load to be measured; performing scene-by-scene training and prediction of a load prediction model on the massive historical data by adopting a bp neural network algorithm; the proposed algorithm model is improved in a parallelization mode by adopting an Apache spark programming framework of cloud computing, and the capability of processing massive high-dimensional data is improved.

The invention is realized by adopting the following technical scheme: a short-term power load parallel prediction method applied to an electric energy quality comprehensive treatment scene comprises the following steps:

step S1: clustering historical data of the power load to obtain a clustered load scene;

step S2: carrying out balance analysis on the clustered load scenes by adopting a balance KNN algorithm to obtain balanced load scenes, and carrying out load scene decision;

step S3: the load scenario obtained in step S2 is trained, and load prediction is performed.

In step S1, a K-means clustering algorithm [ unnecessary technical features ] is used to cluster the power load historical data. In step S3, the load scenario obtained in step S2 is trained using a neural network.

Further, the step S1 specifically includes the following steps:

step S11: storing mass power load historical data on a distributed file system (HDFS) of a Hadoop platform in a distributed manner;

step S12: reading historical data of a power load in the HDFS, carrying out blocking processing on the historical data, and selecting K clustering centers;

step S13: sending the current clustering center data to all nodes, searching the clustering center closest to the data in the nodes, calculating the corresponding distance between the data in the nodes and the closest clustering center, and adding class labels to the training data;

step S14: calculating the sum of the category data of each clustering center and the mean value of each clustering center, and updating the clustering centers according to the mean values;

step S15: and judging whether the clustering result converges to a convergence condition, if not, returning to the step S13, otherwise, ending iteration and outputting the clustering result, wherein the clustering result is a load scene.

Further, step S2 specifically includes the following steps:

step S211: storing the test data and the load scene data generated in the step S1 on a distributed file system (HDFS) of a Hadoop platform in a distributed manner, forming a test data set and a load scene data set, and taking the load scene data set as a training data set;

step S212: converting the training data set and the testing data set into a load scene RDD data set and a testing RDD data set by using a Spark platform;

step S213: determining mark samples, threshold values, sample mark degrees and class mark degrees of various types in the load scene RDD data set;

step S214: calculating the Euclidean distance between each sample in the candidate class of the load scene RDD data set considering the sample mark degree and the test sample in the test RDD data set, and selecting h adjacent samples with the minimum Euclidean distance considering the sample mark degree;

step S215: and (4) calculating the weight of the class mark degree considered of the class to which the k adjacent samples selected in the step (S214) belong, and attributing the k adjacent samples to the class with the maximum class weight to obtain the balanced load scene data.

Further, step S213 is specifically:

calculating the Euclidean distance sum of each sample data and the rest sample data in various types of load scene RDD data sets, and selecting the sample with the minimum distance sum as the mark sample of the type;

calculating various thresholds in the RDD data set of the load scene: calculating the Euclidean distance sum of each sample and the mark sample in each class of the load scene RDD data set, and selecting the maximum value of the Euclidean distance sum of each sample and the mark sample in each class as the threshold value of the class;

respectively calculating Euclidean distances between class samples and mark samples and between mark samples and class samples of the load scene RDD data set, and calculating the ratio of the two, namely the sample mark degree;

and calculating Euclidean distance sums of each class mark sample and each sample of the load scene RDD data set, solving a difference value between the maximum value of the Euclidean distance sums and the Euclidean distance sums, and a difference value between the maximum value of the Euclidean distance sums and the minimum value of the Euclidean distance sums, and then calculating a ratio of the two difference values, namely the class mark degree.

Further, after the threshold value of each class is determined, the distance sum of the test sample and the sample in the class in the test RDD data set is calculated, and the class with the distance sum larger than the threshold value of the class is excluded.

Further, in step S3, the neural network adopts a parallelized BP neural network, and the parallelized adjustment method for the BP neural network includes the following steps:

step S331: setting initial parameters including the number of input layer nodes, the number of output layer nodes, the number of hidden layers, the number of hidden layer nodes and an error range;

step S332: preprocessing the balanced load scene data and normalizing the preprocessed balanced load scene data;

step S333: starting Map tasks at slave nodes, and obtaining a training set by each Mapper end; correcting the weight of the neural network according to the obtained training set, and finally sending the corrected weight to a Reduce end;

step S334: starting a reduce task at a Master node, and calculating a corrected average value sent by each Slave as output;

step S335: updating the configuration file;

step S336: and judging whether the difference value between the forward propagation processing value and the expected value of the neural network reaches the preset precision or whether the learning frequency is greater than the set maximum frequency, if so, obtaining the parallelization adjusted BP neural network, and otherwise, returning to the step S333.

Further, the step S6 specifically includes the following steps:

step S31: training the parallelized adjusted BP neural network model by utilizing a load scene data set based on a Spark platform to obtain a prediction model;

step S32: the test data set is predicted by a prediction model, and the prediction effect of the model is evaluated by using the average absolute percentage error and the acceleration ratio.

The invention also provides a system based on the short-term power load parallel prediction method based on the balanced KNN algorithm, which comprises a storage device and a processing device, wherein the storage device is used for storing the instructions of the method, and the processing device loads and executes the instructions stored in the storage device.

Compared with the prior art, the invention has the following beneficial effects: aiming at the characteristics of the power load data, a K-means clustering algorithm is adopted to divide the power load scene; aiming at unbalanced load scenes, a balanced KNN algorithm is provided to accurately classify the scene of the load to be measured; performing scene-by-scene training and prediction of a load prediction model on the massive historical data by adopting a bp neural network algorithm; the proposed algorithm model is improved in a parallelization mode by adopting an Apache spark programming framework of cloud computing, and the capability of processing massive high-dimensional data is improved.

Drawings

FIG. 1 is a flow chart of a short-term power load parallel prediction method based on a balanced KNN algorithm according to an embodiment of the present invention;

FIG. 2 is a K-means algorithm clustering flow chart according to an embodiment of the present invention;

FIG. 3 is a flow chart of an embodiment of the balanced KNN algorithm of the present invention;

FIG. 4 is a diagram of a balanced KNN classification algorithm calculation structure according to an embodiment of the present invention;

FIG. 5 is a flowchart of BP neural network algorithm clustering according to an embodiment of the present invention;

FIG. 6 is a graph of historical load data for an embodiment of the present invention;

FIG. 7 is a clustered load scenario according to an embodiment of the present invention;

FIG. 8 is a flowchart of a load scenario decision in accordance with an embodiment of the present invention;

FIG. 9 is a graph comparing time required for a conventional algorithm and a parallel algorithm according to an embodiment of the present invention;

FIG. 10 is a comparison of predicted results for embodiments of the present invention;

FIG. 11 is a graph of the root mean square error of the prediction results according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As introduced in the background art, in order to improve the power load prediction accuracy and solve the problem of insufficient single-machine computing resources caused by massive and high-dimensional data of an energy internet, the embodiment provides a balanced KNN-based short-term power load parallel prediction method applied to an electric energy quality comprehensive treatment scene, and a K-means clustering algorithm is adopted to divide the power load scene aiming at the characteristics of power load data; aiming at unbalanced load scenes, a balanced KNN algorithm is provided to accurately classify the scene of the load to be measured; carrying out scene-by-scene training and prediction of a load prediction model on the massive historical data by adopting a BP neural network algorithm; the proposed algorithm model is improved in a parallelization mode by adopting an Apache Spark calculation framework of cloud calculation, the capability of processing massive high-dimensional data is improved, the problem that model training consumes a lot of time under the massive data is solved, and the model training and predicting time for load prediction based on the massive data is remarkably shortened.

As shown in fig. 1, the embodiment provides a short-term power load parallel prediction method applied to a power quality comprehensive treatment scenario, and the method includes the following steps:

step S1: clustering the historical data of the power load by using a K-means clustering algorithm to obtain a clustered load scene;

step S3: the load scenario obtained in step S2 is subjected to neural network training, and load prediction is performed.

In this embodiment, the main reason for performing scene analysis on the historical load curve in load prediction is that the trend and trend of the load curve have an inseparable relationship with factors such as climate and date type, and if the user load under massive data is not studied in a targeted manner, a relatively high resource loss is generated on the power consumption scale of each user. Therefore, reasonable data mining technology is adopted, and load types close to the electricity utilization law and similar to the electricity utilization law are divided into the same scene. As shown in fig. 2, the load scene is divided by using a K-means clustering method, and the time scale of the algorithm is increased by using an Apache Spark calculation framework, which specifically comprises the following steps:

step S12: reading power load historical data in the HDFS, carrying out blocking processing on the power load historical data by 64M, and selecting K clustering centers, wherein K is the number of clusters;

step S13: sending the current clustering center data to all nodes, searching the clustering center closest to the data in the nodes, calculating the corresponding distance between the data in the nodes and the closest clustering center, and adding class labels to the training data; the calculation formula of the distance c between the data in the node and the nearest cluster center is as follows:

c＝argmin||x_i-u_j||；

in the formula, x_iClustering center data of the ith class; u. of_jIs the jth training data in class i.

Step S14: calculating the sum of the category data of each clustering center and the mean value of each clustering center, and updating the clustering centers according to the mean values; the calculation formula of the sum of the category data of each cluster center and the mean value of each cluster center is as follows:

wherein u is_iThe mean value from the class training data to the clustering center; x is the number of_iThe ith training data; s_jThe number of the j type data is;

step S15: and judging whether the clustering result converges to a convergence condition (the distance difference is less than 0.1 or the iteration exceeds 10 times after the two iterations), if not, returning to the step S13, otherwise, ending the iteration, and outputting the clustering result, wherein the clustering result is the load scene.

In this embodiment, the final load scene decision is not accurate due to the unbalanced load scene data generated in the first step. The invention adopts a balanced KNN algorithm to carry out balanced analysis on the load scene and carries out decision making on the load scene.

First, the present embodiment describes a conventional KNN algorithm module.

KNN is a non-parametric classification algorithm that performs classification of samples by computationally finding the class nearest to the test sample. The classification step of KNN can be roughly expressed as: given a sample data set T in advance, if there are k nearest representative samples to be tested, and most of the k representative samples belong to a predetermined type, then the sample to be tested will also be classified in this class.

The KNN algorithm generally adopts Euclidean distance to characterize the distance d between a test sample and a known class sample:

in the formula, T_iRepresenting a training sample; d_jRepresenting the test sample, n representing the number of features of the sample, ω_itAnd ω_jtRespectively representing the values of the t-th feature in the sample vector sum.

The classification idea and process of the traditional KNN algorithm are simple, parameters do not need to be estimated, but the calculation amount of the algorithm in the classification process is remarkable in the background of the mass data set; the performance of the classification of the traditional KNN algorithm is easily influenced by training samples, and when the sample data set is unbalanced, the performance of the classification is extremely easily influenced and even becomes extremely poor.

The embodiment adopts an improved balanced KNN algorithm model, which is specifically as follows:

the balanced KNN algorithm greatly improves the accuracy of processing unbalanced data by the original KNN algorithm. The specific idea of the balanced KNN algorithm is shown in fig. 3, and the specific steps of the improved balanced KNN algorithm are as follows:

(1) and inputting a training data set and a characteristic data set to be detected, and carrying out standardized processing on sample data.

Wherein X is a set of power load characteristics;

a normalized power load feature set; min (X) is the minimum value in the power load; max (x) is the maximum value of the electrical load;

(2) a marker sample is determined. And calculating the Euclidean distance sum of each sample data and the other sample data in the class, selecting the value with the minimum distance sum, and taking the sample with the minimum distance sum as the mark sample of the class.

In the formula, x_t、x_iThe t and i samples in the class, respectively.

(3) And calculating the thresholds of various types. Calculating the Euclidean distance sum of each sample and the mark sample in each class, and taking the maximum value as the threshold value Iflag of the class_i。

In the formula, M_iIs a standard sample of class i, x_tIs as followsT-th sample of class i, Iflag_iIs a threshold value of the ith class.

After the threshold value of each class is determined, the distance sum of the test sample and the samples in the class is calculated, and the class with the distance sum larger than the threshold value is excluded, so that the accuracy and the speed of classification are improved.

(4) And calculating the sample marking degree and the class marking degree. Respectively calculating Euclidean distance between the class sample and the mark sample and distance between the mark sample and the class sample, and calculating the ratio of the two, namely the sample mark degree R_ci。

In the formula, M_iFor class i marker samples, x_t、x_iThe ith sample and the ith sample in the ith class are respectively; n is_iThe number of samples in the ith class.

Calculating Euclidean distance sum of each class mark sample and each sample, solving the difference value between the Euclidean distance sum maximum value and the Euclidean distance sum of each class mark sample and each sample, and the difference value between the maximum value of the Euclidean distance sum and the minimum value of the Euclidean distance sum, and then calculating the ratio of the two difference values, namely the class mark degree P_i。

In the formula, M_iFor class i marker samples, M_jFor class j flag samples, n_i、n_jNumber of samples, x, of class i and j, respectively_t、x_iThe t and i samples in the i class, respectively.

(5) The distance is calculated taking into account the sample signaturity. Calculating a test sample x_jWith each sample x in the candidate class_iEuclidean distance of d (x)_i,x_j) And finding h neighbor samples with the smallest distance.

In the formula, R_ciSample mark degree, omega, of ith sample of the c-th class_itAnd ω_jtRespectively representing sample vectors x_iAnd x_jThe t-th eigenvalue of (1).

(6) Calculating the weight of the considered class mark degree of the class to which the selected h adjacent samples belong, W_jtBelonging to class t c_tThe weight of (d) is calculated as:

in the formula, P_iClass mark degree, v (x) for class i_i,x_j) For the weight function of the vote, the invention takes d (x)_i,x_j)，

(7) X is to be_iIs classified as the class with the largest class weight.

As shown in fig. 4, step S2 of this embodiment specifically includes the following steps:

In this embodiment, the step S213 specifically includes:

In this embodiment, after the threshold value of each class is determined, the sum of the distances between the test sample and the samples in the class in the test RDD dataset is calculated, and the class with the sum of the distances being greater than the threshold value of the class is excluded.

Preferably, in the present embodiment, the load scenario of the test data has already been determined in step S2, and step S3 is to perform training and result prediction of the load scenario by using a neural network on the basis of the above results.

The BP neural network is a multi-layer feedforward network trained according to an error back propagation algorithm, and is one of the most widely applied neural network models at present. The device consists of an output layer, an input layer and a plurality of hidden layers. The BP learning algorithm of the method consists of two processes of forward propagation of network information flow and backward propagation of errors and updating of neuron connection weights.

In order to reduce the operation time of the algorithm by using a parallel operation method, the BP learning algorithm is subjected to MapReduce decomposition, the main decomposition method is to calculate the local gradient change quantity of the weight generated by back propagation for each connection weight of the network, after all samples are processed, the weight is subjected to batch processing and updating once by using the average value of the change quantity,

in this embodiment, as shown in fig. 5, the parallelization adjustment method for the BP neural network includes the following steps:

step S335: updating the configuration file;

In this embodiment, the step S6 specifically includes the following steps:

The embodiment also provides a system based on the short-term electric load parallel prediction method based on the balanced KNN algorithm, which includes a storage device and a processing device, wherein the storage device is used for storing the instructions of the method, and the processing device loads and executes the instructions stored in the storage device.

In particular, since the power load prediction of the present embodiment is a prediction of a future power load from historical data, a difference exists between a predicted value and an actual value, and a power load prediction error is generated. The prediction algorithm of the present embodiment generates many prediction errors, and the summary mainly includes: 1) the historical data of the power load is not complete enough; 2) the power load scene division identification degree is not enough; 3) the initial parameters of the training model are not well chosen. The evaluation indexes adopted in the embodiment are as follows:

let y (i) and

respectively representing the actual load value and the predicted load value at the time i, the following are provided:

absolute pair error:

relative error:

wherein e is₁Mean error per day. Because the prediction error has positive and negative values, the absolute value of the error is taken when the average value is calculated in order to avoid the cancellation of the positive and negative values.

Wherein e is₂Is the root mean square error. The root mean square error index strengthens the action of errors with large numerical values and improves the sensitivity of the index.

The data source of the embodiment is load data collected by a smart meter of a certain power grid, and historical load data of the load data is shown in fig. 6. In fig. 6, the load data includes characteristics of temperature, humidity, precipitation, wind speed, day type, season, etc., and the load data at the time of the whole point is used for training and prediction.

The four subgraphs in fig. 7 correspond to each type of load curve, so that it can be seen that the K-means clustering algorithm has excellent clustering capability and can well distinguish load curves with different characteristics, i.e., load scenes.

Imbalance analysis is performed on the clustered data, load scene decision is performed after the balanced data is obtained, and a load value is determined, and the process is shown in fig. 8.

From the results shown in fig. 9, it can be seen that the predicted time difference between the two is not large when the data samples are small, and on the contrary, the time required by the single-machine algorithm is slightly better than that of the parallel algorithm because: the parallel algorithm divides the data into a plurality of sub-sample sets under a small sample set, and the communication cost among different data subsets is increased to influence the prediction speed; however, as the sample set increases, the iteration time required by the prediction algorithm is significantly different, and the time required by the parallel algorithm is far less than that of the single-machine method.

The comparison result of the load prediction value and the actual load value obtained by the load prediction algorithm based on cloud computing in this embodiment is shown in table 1.

TABLE 1 comparative graph of load prediction results

As shown in fig. 10 and 11, the predicted value curve and the actual value curve have similar trends, and the mean root mean square error is 2.13%. The error of the prediction result meets the error standard of the load prediction. A cloud computing based load prediction algorithm proved to be feasible.

In the embodiment, the performance of the system is strictly tested by taking certain power grid data as an example, and the test comprises the test of the large data processing performance and the power load prediction accuracy of the system. The result shows that the method improves the serious defect of the traditional KNN algorithm, and improves the accuracy of the load scene. After data preprocessing is carried out through the K-means model, Spark with a parallel programming model and a calculation framework is combined with the load prediction algorithm, a balanced KNN algorithm and a parallelized neural network programming model are provided, the problem of the calculated amount of mass data is solved, the time consumed by prediction is greatly shortened, and meanwhile, the prediction precision is guaranteed to meet the load prediction requirement.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A short-term power load parallel prediction method applied to a power quality comprehensive treatment scene is characterized by comprising the following steps: the method comprises the following steps:

step S2: carrying out balance analysis on the clustered load scenes by adopting a balance KNN algorithm, and carrying out load scene decision to obtain a balanced load scene;

step S3: training the load scene obtained in the step S2, and predicting the load;

step S2 specifically includes the following steps:

2. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 1, wherein the method comprises the following steps: in step S1, a K-means clustering algorithm is used to cluster the power load historical data.

3. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 1, wherein the method comprises the following steps: in step S3, the load scenario obtained in step S2 is trained using a neural network.

4. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 2, wherein the method comprises the following steps: the step S1 specifically includes the following steps:

5. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 1, wherein the method comprises the following steps: the step S213 specifically includes:

calculating the Euclidean distance sum of each sample and the mark sample in each class of the load scene RDD data set, and selecting the maximum value of the Euclidean distance sum of each sample and the mark sample in each class as the threshold value of the class;

6. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 5, wherein the method comprises the following steps: after the threshold value of each class is determined, the distance sum of the test sample and the sample in the class in the test RDD data set is calculated, and the class with the distance sum larger than the threshold value of the class is excluded.

7. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 3, wherein the method comprises the following steps: in step S3, the neural network adopts a BP neural network subjected to parallelization adjustment, wherein the parallelization adjustment method for the BP neural network includes the following steps:

step S335: updating the configuration file;

8. The method for predicting the short-term power load in parallel applied to the power quality comprehensive treatment scene according to claim 3, wherein the method comprises the following steps: the step S3 specifically includes the following steps:

9. A system for a short-term power load parallel prediction method applied to an electric energy quality comprehensive treatment scene is based on any one of claims 1 to 8, and is characterized in that: comprising a storage device to store instructions of the method of any one of claims 1 to 8, a processing device to load and execute the instructions stored in the storage device.