CN104881706B - A kind of power-system short-term load forecasting method based on big data technology - Google Patents

A kind of power-system short-term load forecasting method based on big data technology Download PDF

Info

Publication number
CN104881706B
CN104881706B CN201410851910.0A CN201410851910A CN104881706B CN 104881706 B CN104881706 B CN 104881706B CN 201410851910 A CN201410851910 A CN 201410851910A CN 104881706 B CN104881706 B CN 104881706B
Authority
CN
China
Prior art keywords
load
data
day
sequence
factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410851910.0A
Other languages
Chinese (zh)
Other versions
CN104881706A (en
Inventor
张沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Hongyuan Huineng Technology Co Ltd
Original Assignee
Tianjin Hongyuan Huineng Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Hongyuan Huineng Technology Co Ltd filed Critical Tianjin Hongyuan Huineng Technology Co Ltd
Priority to CN201410851910.0A priority Critical patent/CN104881706B/en
Publication of CN104881706A publication Critical patent/CN104881706A/en
Application granted granted Critical
Publication of CN104881706B publication Critical patent/CN104881706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides a kind of power-system short-term load forecasting method based on big data technology, using the load prediction of data mining technology implementation user class, and adds up and forms system loading, comprises the following steps:The similar load curve of shape feature is classified as one kind by load curve cluster analysis;Key influence factor is established, reaches yojan classifying rules, the purpose of Simplified prediction model;Classifying rules is established, using CART decision Tree algorithms, obtains Agglomerative Hierarchical Clustering analysis result;Day to be predicted is classified;Training prediction model is simultaneously predicted, according to the classification results of the day to be predicted drawn, corresponding supporting vector machine model is selected to complete prediction;The computing system load step is completed in Hadoop big data computing platforms.The present invention studies a kind of load prediction frame for user class, and excavates user power utilization Behavior law using data digging method, improves the precision of load prediction.

Description

Power system short-term load prediction method based on big data technology
Technical Field
The invention relates to the technical field of power system engineering, in particular to a power system short-term load prediction method based on a big data technology.
Background
The short-term load prediction result of the power system is related to the formulation of the scheduling operation and production plan of the power system, and the accurate short-term load prediction result is helpful for improving the safety and stability of the system and reducing the power generation cost. With the massive access of distributed energy sources (solar energy, wind energy, energy storage and the like) in the power system, the change rule of the load is more difficult to grasp, and the uncertainty increases the difficulty of load prediction of the power system. Therefore, a prediction method capable of better grasping the load change rule is needed.
The users are the most basic components in the power grid and are also the source of the power grid load fluctuation. However, the current load prediction methods are directed to system-level load prediction, and most deeply, bus-level prediction. Therefore, it is necessary to research a load prediction framework for user level and utilize a data mining method to discover the power utilization behavior rules of users, so as to improve the accuracy of load prediction.
Disclosure of Invention
The invention provides a power system short-term load prediction method based on a big data technology, which effectively solves the problem of low load prediction precision caused by complex power utilization rules of users.
In order to achieve the purpose, the invention adopts the technical scheme that: a power system short-term load prediction method based on big data technology comprises the following steps:
(1) And (3) clustering analysis of load curves: performing aggregation hierarchical clustering analysis on historical load data of a year before the day to be predicted by taking the day as a unit, and classifying load curves with similar shape characteristics into one class;
(2) Establishing key influencing factors: calculating a grey correlation analysis result by combining historical load and weather data, and sorting the result to obtain key influence factors influencing the load;
(3) Establishing a classification rule: taking the hierarchical clustering analysis result and key influence factors as input, and establishing a decision tree by adopting a CART algorithm to obtain an aggregation hierarchical clustering analysis result;
(4) Classifying days to be predicted: inputting the key factor day feature vector data of the day to be predicted into a decision tree to obtain a classification result of the day to be predicted;
(5) Training a prediction model and predicting: selecting historical load data in the corresponding class to train a support vector machine model, and selecting a corresponding support vector machine model to complete prediction according to the classification result of the day to be predicted obtained in the step 4;
(6) Calculating the system load: and (4) aiming at all the users in the predicted target power grid, repeating the steps, and accumulating all user loads and overlapping the grid loss load to obtain the system level load of the whole power grid.
Further, the step (1) specifically comprises the following steps:
the adopted clustering analysis algorithm is an improved coacervation hierarchical clustering algorithm, and the maximum value normalization is carried out on the difference value of each dimension in the Euclidean distance, which is shown as the following formula:
wherein, each day is a load sequence, and n represents that the load sequence is an n-dimensional vector, usually 96-dimensional; d 12 Representing the spatial distance of the load sequence 1 and the load sequence 2; x in the distance 1k Representing the kth dimension of data, x, in the first payload sequence 2k Representing the kth dimension data in the second payload sequence; x is the number of max Represents the maximum value in the kth dimension of all load sequences.
The historical load data of the year before the day to be predicted is used as a historical data set, the hierarchical clustering algorithm applying Euclidean distance improvement is adopted, load data of n points per day form a vector, and the normalized Euclidean distance between the vectors is calculated, so that the vectors are gradually classified into a plurality of classes with similar trends from independent samples in scattered distribution.
Further, the step (2) specifically comprises the following steps:
calculating the gray relevance of each factor by adopting a gray relevance analysis algorithm, taking the historical load data, meteorological data and day type data set which predicts the year before the day as an analysis sample, setting a mother sequence as a load value, and setting the weather factor and the day type as a plurality of subsequences, analyzing the relevance of each subsequence and the mother sequence by adopting the gray relevance analysis algorithm, finally, averaging the gray relevance of each influencing factor every year to obtain the gray relevance of each influencing factor, sequencing the gray relevance, and selecting the first 4 with larger value as key influencing factors influencing the load, wherein the method specifically comprises the following steps:
(a) Determining a normalized attribute matrix;
the historical load data value is the mother sequence Y = { Y = { Y = } 1 ,y 2 ,…,y p } T The key influencing factor corresponding to the factor is the subsequence X i ={x 1i ,x 2i ,…,x pi } T Then the matrix can be obtained as follows:
wherein p represents p samples, q represents q influencing factors to be analyzed, x represents a factor sequence, and y represents a load sequence.
(b) The a matrix is then normalized by the mean value as follows,
in the formula, x i (t) represents the value at time t of the ith factor, d i An averaging operator representing each sequence,represents the average of each column element. The mother sequence Y is also normalized according to the same principle, and the averaging operator is recorded as D;
(c) The A matrix is normalized as follows:
(d) Calculating the correlation coefficient
Factor X i And the t index of the load sequence Y, and a correlation coefficient ξ between the t index and the t index of the load sequence Y i The geometric meaning of (t) is curve X i The relative difference from the curve Y at the time t is calculated as follows:
in the formula,. DELTA. max Is | m k (t)-e i Maximum value of (t) |, Δ min Is | m k (t)-e i (t) | minimum value; | m k (t)-e i (t) | is the value at time t; rho is a resolution coefficient, has the function of improving the difference between the correlation coefficients, is generally selected from 0 to 1, and is usually rho =0.5;
(e) Determining key contributor rankings
On the basis of the above-mentioned correlation coefficient, the factor X can be calculated i The degree of association with the load Y is:
generally, the grey correlation value r i Values between 0 and 1, the closer the value is to 1, the greater the degree of linear correlation between variables X, Y, r i The closer the absolute value of (A) is to 0, the more no linear correlation between X, Y is represented; 0<r i &1, indicating that X, Y has a correlation, but a nonlinear relationship; | r i | > 0.6, regarded as highly correlated; r is more than or equal to 0.2 i |&0.6, considered moderately correlated; | r i |&And lt, 0.2, the correlation is considered to be extremely weak and can be ignored.
Further, the step (3) specifically includes the following steps:
the adopted algorithm is a CART decision tree algorithm, the key influence factor with the minimum Gini index is selected at each node except leaf nodes, and the historical load data set of the current node is divided into two subsets until the final classification result is matched with the clustering result in the step 1. The process completes the learning of the coupling relation between the historical load and the key influence factor data and the clustering result, and can clearly and perfectly represent the classification rules.
Further, the step (5) specifically includes the following steps:
aiming at the classification result of the step 1, constructing a training sample by using load data of each class and corresponding key factor data, training a plurality of support vector machine models, selecting an RBF kernel function as the kernel function of the support vector machine, and selecting a grid optimization method, namely an exhaustion method, as a parameter optimization method;
and 4, selecting a corresponding support vector machine model to complete the prediction according to the classification result of the day to be predicted obtained in the step 4.
Further, the step (6) is completed on a Hadoop big data computing platform.
The invention has the advantages and positive effects that: the invention provides a power system short-term load forecasting method based on a big data technology, and the power utilization behavior rule of a user is excavated by using a data mining method, so that the load forecasting precision is improved, the safety and the stability of a power system are improved, and the power generation cost can be reduced.
Drawings
FIG. 1 is a schematic structural framework of the present invention;
FIG. 2 is a flow chart of the algorithm of the present invention;
FIG. 3 is a hierarchical clustering tree for user # 1;
FIG. 4 is a class 6 load graph for user # 1;
FIG. 5 is a big data computing platform framework diagram;
FIG. 6 is a diagram illustrating the effect of a method for predicting the short-term load of an electric power system based on big data technology.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 2, a method for predicting short-term load of an electric power system based on big data technology includes the following steps:
s1, inputting historical load data;
s2, clustering the input historical load data by using an improved hierarchical clustering algorithm;
the trend of the load curve is closely related to the type of day, weather factors and the like. Through the clustering analysis of the curves, the load curves with similar shape characteristics can be classified into one class.
The cluster analysis algorithm adopted by the invention is an improved coacervation hierarchical clustering algorithm. Meanwhile, the difference value of each dimension in the Euclidean distance is subjected to maximum value normalization, which is shown as the following formula:
wherein, each day is a load sequence, and n represents that the load sequence is an n-dimensional vector (usually 96-dimensional); d12 represents the spatial distance of the load sequence 1 and the load sequence 2; x1k in the distance represents the kth dimension data in the first loading sequence, and x2k represents the kth dimension data in the second loading sequence; xmax represents the maximum value in the kth dimension of all load sequences.
And taking the historical load data of the year before the day to be predicted as a historical data set. The hierarchical clustering algorithm improved by applying the standardized Euclidean distance is adopted, a vector is formed by load data of n points per day, and the standardized Euclidean distance between the vectors is calculated to gradually classify the vectors into a plurality of classes with similar trends from independent samples distributed scattered.
S3, simultaneously, the input historical load data, the weather information and the historical day and day type data are utilized to find out key influence factors by adopting a grey correlation analysis algorithm;
the invention adopts a grey correlation analysis algorithm to calculate the grey correlation degree of each factor (such as daily maximum air temperature, daily average air temperature, average humidity, daily type (day of week) and the like). And taking historical load data, meteorological data and a day type data set of the year before the forecast date as analysis samples, setting a mother sequence as a load value, and setting a weather factor and a day type as a plurality of subsequences. And analyzing the correlation between each subsequence and the parent sequence by adopting a gray correlation analysis algorithm, and finally averaging the gray correlation degrees of each day in one year to obtain the gray correlation degrees of each influence factor. And sequencing the grey correlation degrees, and taking the first 4 with larger selected values as key influence factors influencing the load.
S4, generating a decision tree by using a CART algorithm on the basis of the S2 and the S3;
and (3) selecting the key influence factor with the minimum Gini index at each node (except leaf nodes), and dividing the historical load data set of the current node into two subsets until the final classification result is consistent with the clustering result in the S1. The process completes the learning of the coupling relation between the historical load and the key influence factor data and the clustering result, and can clearly and perfectly represent the classification rules.
S5, forming N historical data sample sets according to the clustering result N of the S2;
s6, storing the classification rules represented by the decision tree generated in S4;
s7, training corresponding N x 96 support vector machine models (96 represent 96 sampling points of load data, so that each sampling point corresponds to one support vector machine model) according to the N historical data sample sets;
and (3) aiming at the classification result in the step (1), constructing a training sample by using the load data of each class and the corresponding key factor data, and training a plurality of support vector machine models. In the process, the kernel function of the support vector machine selects the RBF kernel function, so that the parameters of the support vector machine, which need to be determined under the kernel function, are the kernel function parameters delta 2 An insensitivity coefficient epsilon and a penalty parameter c. The optimization method of the parameters adopts a grid optimization method, namely an exhaustion method.
S8, forming a day key influence factor vector to be predicted according to the key influence factor information of the day to be predicted;
s9, inputting the vector in the S8 into the classification rule in the S6 to obtain the category of the day to be predicted;
s10, according to the category of the day to be predicted in S9, selecting a corresponding model in S7 for prediction;
s11, performing the operation on all users in the target power grid;
s12, summing the prediction results of all users;
s13 outputs the summed prediction result, i.e. the system load.
Taking the city-level system load prediction of a certain city in Zhejiang province as an example, the city is provided with 13 220 kv-level transformer stations, and users share 120 households. Meanwhile, the city is provided with an electricity consumption information acquisition system, and 96 points/day of equidistant load sampling points of each user can be obtained.
Step 1 (corresponding to S1, S2): selecting a user #1, wherein the load data of the user to be predicted in 1 year before the day is in the following format:
each row of data in the table is a 96-dimensional data sample, and the normalized euclidean distance between every two vectors is calculated by using the following formula:
and merging samples with the shortest distance according to the calculation result, as shown in fig. 3, the bottom layer of the graph is 365 samples of the historical load data, and the samples are gradually merged from top to top and finally classified into 6 types. Six colors represent 6 classes, and fig. 4 is a graph of six classes of loads.
Step 2 (corresponding to S3) was performed: and finding out key influence factors influencing the load change by adopting a grey correlation analysis algorithm according to the historical load data, the historical weather factor data and the day type data. The data format is as follows:
the first 7 columns of the data table are taken as subsequences, the 8 th column is taken as a mother sequence, and the following grey correlation analysis algorithm is adopted for calculation:
applying a grey correlation analysis method to the determination of key factors influencing load change, comprising the following steps of:
(a) Determining a normalized attribute matrix;
the historical load data value is the mother sequence Y = { Y = 1 ,y 2 ,…,y p } T The key influencing factor corresponding to the factor is the subsequence X i ={x 1i ,x 2i ,…,x pi } T Then the matrix can be obtained as follows:
wherein p represents p samples, q represents q influencing factors to be analyzed, x represents a factor sequence, and y represents a load sequence.
(b) The a matrix is then normalized by the mean value as follows,
in the formula, x i (t) represents the value at time t of the ith factor, d i An averaging operator representing each sequence,represents the average of each column element. And normalizing the mother sequence Y according to the same principle, and recording an averaging operator as D.
(c) The A matrix is normalized as follows:
(d) Calculating the correlation coefficient
Factor X i And the t index of the load sequence YCorrelation coefficient xi between targets i The geometrical meaning of (t) is curve X i The relative difference from the curve Y at the time t is calculated as follows:
in the formula,. DELTA. max Is | m k (t)-e i Maximum value of (t) |, Δ min Is | m k (t)-e i (t) | minimum value; | m k (t)-e i (t) | is the value at time t. ρ is a resolution coefficient, which is used to improve the difference between the correlation coefficients, and is generally selected from 0 to 1, and ρ =0.5 is usually taken.
(e) Determining key influencing factor ranking
On the basis of the above-mentioned correlation coefficient, the factor X can be calculated i The degree of association with the load Y is:
generally, the grey correlation value r i The value is between 0 and 1. The closer the value is to 1, the greater the degree of linear correlation between variables X, Y, r i The closer the absolute value of (A) is to 0, the more no linear correlation between X, Y is represented; 0<r i &1, indicating that X, Y has a correlation, but a nonlinear relationship; | r i | ≧ 0.6, considered highly correlated; r is more than or equal to 0.2 i |&0.6, considered moderately correlated; | r i |&And lt, 0.2, the correlation is considered to be extremely weak and can be ignored.
Through the above calculation, the gray relevance degrees of the 6 influencing factors of the user #1 are obtained as follows:
and judging that the average precipitation and the average wind speed are extremely weak correlation factors, and the rest maximum air temperature, average humidity and day type are key influence factors influencing the load trend.
Step 3 (corresponding to S4, S6): and generating a decision tree by using a CART algorithm according to the 2012-year load data clustering result of the user #1 and the key influence factors. The data format is as follows:
user #1 Maximum air temperature Mean temperature Average humidity Type of day Clustering results
2012/01/01 7.7 3.65 75.2 7 1
2012/01/02 7.9 2.89 68.6 1 4
2012/12/31 6.6 0.09 61.3 1 5
Description of data format: the top 4 columns in the table are 4 key influence factors found in the implementation step 2 and influencing the load trend of the user #1, and the clustering result represents which category of the 6 categories the historical load curve is clustered into in the implementation step 1.
Taking the last column of the table as the final leaf node type of the decision tree, taking the four items of key influence factor data as a candidate set for splitting the nodes of the decision tree, and calculating which value of which attribute is adopted as the optimal classification attribute when each node is classified, wherein the algorithm is as follows:
the CART decision tree is a binary recursive partitioning technique that partitions the current sample set into two subsets at each node (except the leaf nodes). Unlike the information gain method based on information theory, the attribute selection metric used by the CART algorithm is the Gini index (Gini index). The kini index is used to measure the impurity degree of the training sample set D, and assuming that the data set D includes m classes, the calculation formula of the kini index is:
wherein p is j Is the frequency of occurrence of the class j element. The kini index requires consideration of a binary partition of each attribute, assuming that a binary partition of an attribute divides D into D 1 And D 2 Then, the kini index of the sample set D divided by some attribute a at the child node this time is:
for each attribute, considering each possible binary partition, the subset of the smallest kini indices that the attribute produces is finally selected as its split subset. Therefore, the Gini index on the attribute A can be found from the above formula A (D) The smaller the size, the better the partitioning effect on the attribute a. Under the rule, the tree is continuously split from top to bottom until the growth of the whole decision tree is completed. And storing the decision tree as the classification rule.
Step 4 (corresponding to S5, S7):
the implementation of step 4 can be performed simultaneously with the implementation of step 2, and this step mainly completes the training of the prediction model. According to the clustering result in the implementation step 1, sorting the corresponding historical data of each class into a sample set, wherein the format is as follows:
description of data format: the table is the data in the first class result, and because the support vector machine model requires one model per data dimension, the class requires 96 support vector machine models to be trained and stored.
Step 5 (corresponding to S8, S9 and S10) is performed: the load values for 96 points throughout the day of 29 days in 4 months in 2013 are now selected as the predicted objects. The weather forecast information and the day type information of the day are as follows:
user #1 Maximum air temperature Mean temperature Average humidity Type of day
2013/04/29 21℃ 12.75℃ 54.3% 6 (Saturday)
Inputting the vector into the classification rule established in the third step to obtain a classification result of the 2 nd class. And then selecting the support vector machine model corresponding to the second type established in the implementation step 4 to carry out 96-point load prediction, and outputting and storing the result.
Step 6 (corresponding to S11, S12 and S13) is performed:
the implementation process of the step is completed on a Hadoop big data computing platform, and the Hadoop big data computing platform is an open-source data platform. The most core designs in the Hadoop framework are HDFS and MapReduce, and FIG. 5 is a framework diagram of a large data platform. HDFS provides storage of mass data, and MapReduce provides parallel computation of the data. The big data platform we use contains 4 servers, each server is configured with two E5-2630V2 CPUs and 500G of storage space. The operations from step 1 to step 5 are performed on 120 general households in the city of Zhejiang province, and the calculation time is shown in the following table:
and accumulating the load prediction results of 120 ten thousand users to obtain the final system load. The predicted results are shown in FIG. 6.
The maximum relative error of the traditional method is 3.36 percent, the minimum relative error is 0.51 percent, and the average relative error is 1.68 percent; the prediction result obtained by the method is that the maximum relative error is 1.35%, the minimum relative error is 0.07%, and the average relative error is 1.68%.
The embodiments of the present invention have been described in detail, but the description is only for the preferred embodiments of the present invention and should not be construed as limiting the scope of the present invention. All equivalent changes and modifications made within the scope of the present invention should be covered by the present patent.

Claims (6)

1. A method for predicting short-term load of a power system based on big data technology comprises the following steps:
(1) And (3) clustering analysis of load curves: performing aggregation hierarchical clustering analysis on historical load data of the year before the day to be predicted by taking day as a unit, and classifying load curves with similar shape characteristics into one class;
(2) Establishing key influencing factors: calculating a grey correlation analysis result by combining the historical load and the weather data, and sorting the result to obtain key influence factors influencing the load;
(3) Establishing a classification rule, taking a hierarchical clustering analysis result and key influence factors as input, and establishing a decision tree by adopting a CART algorithm to obtain an aggregation hierarchical clustering analysis result;
(4) Classifying days to be predicted: inputting the key factor day feature vector data of the day to be predicted into a decision tree to obtain a classification result of the day to be predicted;
(5) Training a prediction model and predicting: selecting historical load data in the corresponding class to train a support vector machine model, and selecting a corresponding support vector machine model to complete prediction according to the classification result of the day to be predicted obtained in the step 4;
(6) Calculating the system load: and (4) aiming at all the users in the predicted target power grid, repeating the steps, and accumulating all user loads and overlapping the grid loss load to obtain the system level load of the whole power grid.
2. The method for predicting the short-term load of the power system based on the big data technology as claimed in claim 1, wherein: the step (1) specifically comprises the following steps:
the adopted clustering analysis algorithm is an improved coacervation hierarchical clustering algorithm, and the maximum value normalization is carried out on the difference value of each dimension in the Euclidean distance, which is shown as the following formula:
wherein, each day is a load sequence, and n represents that the load sequence is an n-dimensional vector, usually 96-dimensional; d 12 Representing the spatial distance of the load sequence 1 and the load sequence 2; x in the distance 1k Representing the kth dimension of data, x, in the first payload sequence 2k Representing the kth dimension data in the second payload sequence; x is the number of max Represents the maximum value in the k-dimension of all the load sequences;
the historical load data of the year before the day to be predicted is used as a historical data set, the hierarchical clustering algorithm improved on the Euclidean distance is adopted, load data of n points per day form a vector, and the normalized Euclidean distance between the vectors is calculated, so that the vectors are gradually classified into a plurality of classes with similar trends from independent samples distributed scattered.
3. The method for predicting the short-term load of the power system based on the big data technology as claimed in claim 1, wherein: the step (2) specifically comprises the following steps:
calculating the grey correlation degree of each factor by adopting a grey correlation analysis algorithm, taking historical load data, meteorological data and a day type data set of a year before the forecast date as analysis samples, setting a mother sequence as a load value, and setting weather factors and day types as a plurality of subsequences; analyzing the correlation between each subsequence and the parent sequence by adopting a gray correlation analysis algorithm, and finally averaging the gray correlation degrees of each day in one year to obtain the gray correlation degree of each influence factor; the grey relevance degrees are sorted, the first 4 with larger values are selected as key influence factors influencing the load, and the specific steps are as follows:
(a) Determining a normalized attribute matrix;
the historical load data value is the mother sequence Y = { Y = { Y = } 1 ,y 2 ,···,y p } T The key influencing factor corresponding to the factor is the subsequence X i ={x 1i ,x 2i ,···,x pi } T Then the matrix can be obtained as follows:
in the formula, p represents p samples, q represents q influencing factors to be analyzed, x represents a factor sequence, and y represents a load sequence;
(b) The a matrix is then normalized by the mean value as follows,
in the formula, x i (t) represents the value at time t of the ith factor, d i An averaging operator representing each sequence,represents the average of each column element; the mother sequence Y is also normalized according to the same principle, and the averaging operator is recorded as D;
(c) The A matrix is normalized as follows:
(d) Calculating the correlation coefficient
Factor X i And the t index of the load sequence Y, and a correlation coefficient ξ between the t index and the t index of the load sequence Y i The geometric meaning of (t) is curve X i The relative difference from the curve Y at the time t is calculated as follows:
in the formula,. DELTA. max Is | m k (t)-e i Maximum value of (t) |, Δ min Is | m k (t)-e i (t) | minimum value; | m k (t)-e i (t) | is a value at the time t, and ρ is a resolution coefficient, which has the effect of improving the difference between the correlation coefficients, and is generally selected from 0 to 1, and usually ρ =0.5;
(e) Determining key influencing factor ranking
On the basis of the above-mentioned correlation coefficient, the factor X can be calculated i The degree of association with the load Y is:
generally, the grey correlation value r i Values between 0 and 1, the closer the value is to 1, the greater the degree of linear correlation between variables X, Y, r i The closer the absolute value of (A) is to 0, the more no linear correlation between X, Y is represented; 0<r i &1, representing that X, Y has a relevant relationship but a nonlinear relationship; | r i | ≧ 0.6, considered highly correlated; r is more than or equal to 0.2 i |&0.6, considered moderately correlated; | r i |&And lt, 0.2, the correlation is considered to be extremely weak and can be ignored.
4. The method for predicting the short-term load of the power system based on the big data technology as claimed in claim 1, wherein: the step (3) specifically comprises the following steps:
the adopted algorithm is a CART decision tree algorithm, the key influence factor with the minimum Gini index is selected at each node except leaf nodes, the historical load data set of the current node is divided into two subsets until the final classification result is matched with the clustering result in the step 1, the process finishes learning the coupling relation between the historical load and the key influence factor data and the clustering result, and the classification rule can be clearly and perfectly represented.
5. The method for predicting the short-term load of the power system based on the big data technology as claimed in claim 1, wherein: the step (5) specifically comprises the following steps:
aiming at the classification result in the step 1, constructing a training sample by using the load data of each class and the corresponding key factor data, and training a plurality of support vector machine models; the kernel function of the support vector machine selects RBF kernel function, and the optimization method of the parameters adopts a grid optimization method, namely an exhaustion method;
and 4, selecting a corresponding support vector machine model to complete prediction according to the classification result of the day to be predicted obtained in the step 4.
6. The method for predicting the short-term load of the power system based on the big data technology as claimed in claim 1, wherein: and (6) finishing the step on a Hadoop big data computing platform.
CN201410851910.0A 2014-12-31 2014-12-31 A kind of power-system short-term load forecasting method based on big data technology Active CN104881706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410851910.0A CN104881706B (en) 2014-12-31 2014-12-31 A kind of power-system short-term load forecasting method based on big data technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410851910.0A CN104881706B (en) 2014-12-31 2014-12-31 A kind of power-system short-term load forecasting method based on big data technology

Publications (2)

Publication Number Publication Date
CN104881706A CN104881706A (en) 2015-09-02
CN104881706B true CN104881706B (en) 2018-05-25

Family

ID=53949193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410851910.0A Active CN104881706B (en) 2014-12-31 2014-12-31 A kind of power-system short-term load forecasting method based on big data technology

Country Status (1)

Country Link
CN (1) CN104881706B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583503A (en) * 2018-12-03 2019-04-05 国网江苏省电力有限公司扬州供电分公司 A kind of interruptible load prediction technique

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184424B (en) * 2015-10-19 2017-07-07 国网山东省电力公司菏泽供电公司 Realize that the multi-kernel function of multi-source heterogeneous data fusion learns the Mapreduceization short-term load forecasting method of SVM
CN105389625B (en) * 2015-10-27 2021-05-28 福建奥通迈胜电力科技有限公司 Active power distribution network ultra-short term load prediction method
CN105678404B (en) * 2015-12-30 2019-07-23 东北大学 Based on online shopping electricity and dynamically associate the micro-grid load forecasting system and method for the factor
CN105678415A (en) * 2016-01-05 2016-06-15 湖南大学 Method for predicting net load of distributed power supply power distribution network
CN105825298B (en) * 2016-03-14 2020-05-01 梁海东 Power grid metering early warning system and method based on load characteristic estimation
CN106126515A (en) * 2016-05-12 2016-11-16 广东电网有限责任公司信息中心 A kind of automatic Model Selection method of big data system component
CN105844371A (en) * 2016-05-19 2016-08-10 北京中电普华信息技术有限公司 Electricity customer short-term load demand forecasting method and device
CN106096766A (en) * 2016-06-06 2016-11-09 国网江苏省电力公司 A kind of short-term load forecasting method based on big data thinking pattern
CN106485262B (en) * 2016-09-09 2020-02-07 国网山西省电力公司晋城供电公司 Bus load prediction method
CN106548270B (en) * 2016-09-30 2020-08-14 许昌许继软件技术有限公司 Photovoltaic power station power abnormity data identification method and device
CN106505593B (en) * 2016-10-14 2017-11-10 国网信通亿力科技有限责任公司 A kind of analysis of distribution transforming three-phase imbalance and the method for load adjustment based on big data
CN106570250A (en) * 2016-11-02 2017-04-19 华北电力大学(保定) Power big data oriented microgrid short-period load prediction method
CN106779147B (en) * 2016-11-18 2020-09-29 重庆邮电大学 Power load prediction method based on self-adaptive hierarchical time sequence clustering
CN106651005B (en) * 2016-11-18 2020-08-28 云南电网有限责任公司电力科学研究院 Baseline load prediction method and device
CN106682999A (en) * 2016-11-18 2017-05-17 云南电网有限责任公司电力科学研究院 Electric power user baseline load calculating method and apparatus thereof
CN106849358A (en) * 2017-02-24 2017-06-13 威凡智能电气高科技有限公司 A kind of gridding is coupled intelligent distribution network system
CN106933173A (en) * 2017-03-06 2017-07-07 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of equipment for dyeing and finishing monitoring system based on CORBA
CN107229602B (en) * 2017-05-22 2020-09-11 湘潭大学 Method for identifying electricity consumption behavior of intelligent building microgrid
CN107368853A (en) * 2017-07-14 2017-11-21 上海博辕信息技术服务有限公司 Power network classification of the items based on machine learning determines method and device
CN107561997B (en) * 2017-08-22 2019-09-24 电子科技大学 A kind of power equipment state monitoring method based on big data decision tree
CN108122173A (en) * 2017-12-20 2018-06-05 国家电网公司 A kind of conglomerate load forecasting method based on depth belief network
CN108416366B (en) * 2018-02-06 2021-09-21 武汉大学 Power system short-term load prediction method based on meteorological index weighted LS-SVM
CN108446795B (en) * 2018-02-28 2020-11-17 广东电网有限责任公司电力调度控制中心 Power system load fluctuation analysis method and device and readable storage medium
CN108596362B (en) * 2018-03-22 2021-12-28 国网四川省电力公司经济技术研究院 Power load curve form clustering method based on adaptive piecewise aggregation approximation
CN108596227B (en) * 2018-04-12 2023-08-08 广东电网有限责任公司 Mining method for dominant influence factors of electricity consumption behaviors of users
CN108304978A (en) * 2018-05-08 2018-07-20 国网江西省电力有限公司经济技术研究院 A kind of mid-term Electric Power Load Forecast method based on data clusters theory
CN108964019B (en) * 2018-06-14 2022-05-27 沈阳工业大学 Power grid multi-element regulation and control method
CN109242174A (en) * 2018-08-27 2019-01-18 广东工业大学 A kind of adaptive division methods of seaonal load based on decision tree
CN109034504B (en) * 2018-09-14 2021-06-25 云南电网有限责任公司 Method and device for establishing short-term load prediction model
CN109067778B (en) * 2018-09-18 2020-07-24 东北大学 Industrial control scanner fingerprint identification method based on honeynet data
CN109472407A (en) * 2018-11-02 2019-03-15 国网河北省电力有限公司雄安新区供电公司 The dispatching method and terminal device of energy device
CN111191811A (en) * 2018-11-14 2020-05-22 中兴通讯股份有限公司 Cluster load prediction method and device and storage medium
CN111768020A (en) * 2019-04-02 2020-10-13 卜晓阳 Customer electricity demand identification method based on SVM algorithm
CN110689195A (en) * 2019-09-26 2020-01-14 云南电网有限责任公司电力科学研究院 Power daily load prediction method
CN111028100A (en) * 2019-11-29 2020-04-17 南方电网能源发展研究院有限责任公司 Refined short-term load prediction method, device and medium considering meteorological factors
CN111080011A (en) * 2019-12-16 2020-04-28 清华四川能源互联网研究院 Load electric quantity deviation prediction method and device
CN111105098B (en) * 2019-12-25 2023-11-03 国能信控互联技术有限公司 Load prediction method and system for self-matching of single user algorithm
CN111815054A (en) * 2020-03-31 2020-10-23 浙江大学 Industrial steam heat supply network short-term load prediction method based on big data
CN111582911B (en) * 2020-04-14 2023-06-30 广东卓维网络有限公司 Friendly interactive power utilization system for multiple users and power grid
CN111784204A (en) * 2020-07-28 2020-10-16 南方电网能源发展研究院有限责任公司 High-quality user mining method and system based on user power consumption behavior portrait
CN112734135B (en) * 2021-01-26 2022-07-15 吉林大学 Power load prediction method, intelligent terminal and computer readable storage medium
CN113222216A (en) * 2021-04-14 2021-08-06 国网江苏省电力有限公司营销服务中心 Method, device and system for predicting cooling, heating and power loads
CN113377841A (en) * 2021-06-21 2021-09-10 国网宁夏电力有限公司电力科学研究院 Big data-based energy load prediction system
CN114066076B (en) * 2021-11-22 2023-03-28 北京白龙马云行科技有限公司 Network taxi appointment prediction method and device based on multiple tenants
CN114912720A (en) * 2022-07-15 2022-08-16 石家庄科林电气股份有限公司 Memory network-based power load prediction method, device, terminal and storage medium
CN115085196B (en) * 2022-08-19 2022-12-23 国网信息通信产业集团有限公司 Power load predicted value determination method, device, equipment and computer readable medium
CN115630772B (en) * 2022-12-19 2023-05-09 国网浙江省电力有限公司宁波供电公司 Comprehensive energy detection and distribution method, system, equipment and storage medium
CN116502768A (en) * 2023-05-23 2023-07-28 中国南方航空股份有限公司 Civil aviation information post load early warning method, system and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101414366A (en) * 2008-10-22 2009-04-22 西安交通大学 Method for forecasting electric power system short-term load based on method for improving uttermost learning machine

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756591B2 (en) * 2006-04-25 2010-07-13 Pegasus Technologies, Inc. System for optimizing oxygen in a boiler

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101414366A (en) * 2008-10-22 2009-04-22 西安交通大学 Method for forecasting electric power system short-term load based on method for improving uttermost learning machine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《考虑气象因素的负荷预测方法研究及其系统实现》;张红旭;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20090115;全文 *
栗然.《电力负荷分析与预测的分布式数据仓库和数据挖掘研究》.《中国博士学位论文全文数据库 信息科技辑》.2011, *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583503A (en) * 2018-12-03 2019-04-05 国网江苏省电力有限公司扬州供电分公司 A kind of interruptible load prediction technique

Also Published As

Publication number Publication date
CN104881706A (en) 2015-09-02

Similar Documents

Publication Publication Date Title
CN104881706B (en) A kind of power-system short-term load forecasting method based on big data technology
Putatunda et al. A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost
US20220076150A1 (en) Method, apparatus and system for estimating causality among observed variables
CN112382352B (en) Method for quickly evaluating structural characteristics of metal organic framework material based on machine learning
CN111199016A (en) DTW-based improved K-means daily load curve clustering method
Kuismin et al. Estimation of covariance and precision matrix, network structure, and a view toward systems biology
CN108053077A (en) A kind of short-term wind speed forecasting method and system based on two type T-S fuzzy models of section
CN103279556A (en) Iteration text clustering method based on self-adaptation subspace study
CN114169434A (en) Load prediction method
CN113255900A (en) Impulse load prediction method considering improved spectral clustering and Bi-LSTM neural network
CN107480441B (en) Modeling method and system for children septic shock prognosis prediction
CN115759415A (en) Power consumption demand prediction method based on LSTM-SVR
Xu et al. Ontology integration to identify protein complex in protein interaction networks
Vilaysouk et al. Semisupervised machine learning classification framework for material intensity parameters of residential buildings
CN111639712A (en) Positioning method and system based on density peak clustering and gradient lifting algorithm
Li et al. Intelligent product-gene acquisition method based on K-means clustering and mutual information-based feature selection algorithm
CN116415177A (en) Classifier parameter identification method based on extreme learning machine
Szymański et al. LNEMLC: Label network embeddings for multi-label classification
Zhou et al. An intelligent model validation method based on ECOC SVM
Ogawa et al. PV output forecasting by deep Boltzmann machines with SS‐PPBSO
CN111127184B (en) Distributed combined credit evaluation method
CN111160715A (en) BP neural network based new and old kinetic energy conversion performance evaluation method and device
Kawamura et al. A new filter evaluation function for feature subset selection with evolutionary computation
CN117217392B (en) Method and device for determining general equipment guarantee requirement
CN117391258B (en) Method, device, equipment and storage medium for predicting negative carbon emission

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant