CN109886465B

CN109886465B - Power distribution network load prediction method based on intelligent electric meter user cluster analysis

Info

Publication number: CN109886465B
Application number: CN201910050608.8A
Authority: CN
Inventors: 黄南天; 王文婷; 蔡国伟; 杨冬锋; 黄大为; 杨德友; 孔令国; 王燕涛; 杨学航; 包佳瑞琦; 吴银银; 张祎祺; 李宏伟; 陈庆珠; 刘宇航; 张良; 刘博�
Original assignee: Northeast Dianli University
Current assignee: Jilin Taisite Technology Development Co ltd; Northeast Electric Power University; Economic and Technological Research Institute of State Grid Jilin Electric Power Co Ltd
Priority date: 2019-01-20
Filing date: 2019-01-20
Publication date: 2022-03-18
Anticipated expiration: 2039-01-20
Also published as: CN109886465A

Abstract

A power distribution network load prediction method based on intelligent electric meter user cluster analysis is characterized by comprising the following steps: analyzing the load fluctuation of the intelligent electric meter user, and dividing 24 hours a day into 3 periods with different fluctuation degrees according to the fluctuation degree; determining an input feature set of a predictor, and analyzing feature importance of different users under the feature set; describing user differences by using the feature importance set, carrying out SDKM clustering on the users, classifying the users with similar response degrees of input features into a class, and determining the optimal clustering result of different distribution network total load fluctuation degree periods in the day by adopting a statistical experiment; and selecting a random forest predictor based on ensemble learning, and respectively constructing a rolling prediction model aiming at the optimal clustering results in different fluctuation degree periods. The method solves the problems that the initial point selection of the clustering center is random and easy to fall into local optimum, reduces the prediction error of a rolling prediction model, and improves the power distribution network load prediction precision based on the intelligent ammeter user.

Description

Power distribution network load prediction method based on intelligent electric meter user cluster analysis

Technical Field

The invention relates to the technical field of electricity, in particular to a power distribution network load prediction method based on intelligent electric meter user cluster analysis.

Background

Load prediction plays a crucial role in achieving economically optimized scheduling, safe operation, and distributed renewable clean energy consumption. The accuracy of the distribution network day-ahead load prediction directly affects the economic and safe operation of the distribution system. Compared with the traditional load prediction, the distribution network day-ahead load prediction has the characteristics of strong volatility, complex distribution network user composition in the area to be predicted, large user electricity utilization behavior difference and the like, and the prediction difficulty is higher. The user electricity consumption data collected by a Smart Meter (SM) provides a data base for load characteristic analysis of mass users. On the basis of big data mining, clustering analysis can be carried out on massive users, and different prediction models can be established for each type of clustered users in a pertinence manner. And the model prediction precision is further improved by reducing the user difference in the class.

At present, intelligent electric meter user clustering is mainly used for clustering users according to load curves or statistical load characteristics of the users, and is influenced by user electric quantity, so that the users in a class are difficult to ensure to have similar response to input characteristics of a predictor, and the influence of non-load class characteristics on future electric load is difficult to analyze. Meanwhile, the influence of the total load fluctuation degree to be predicted in different time periods on the optimal clustering number of the users in the time period is not analyzed. And the initial point selection of the clustering center has randomness and is easy to fall into local optimum.

Disclosure of Invention

The invention aims to overcome the limitations and defects of the prior art and provide a power distribution network load prediction method based on intelligent electric meter user cluster analysis, which is scientific, reasonable, high in applicability and good in effect.

The purpose of the invention is realized by the following technical scheme: a power distribution network load prediction method based on intelligent electric meter user cluster analysis is characterized by comprising the following steps:

1) analyzing the load fluctuation of the intelligent electric meter user, and dividing 24 hours a day into 3 periods with different fluctuation degrees according to the fluctuation degree

The standard deviation sigma is used for embodying the fluctuation of the power consumption of the user, and the formula is as follows:

where σ (t) represents a standard deviation at time t, N represents the number of SMs, and N is 1,2, …, N; t represents each time, t is 1,2, …,48, L_n(t) is the load value of the nth SM at the time t;

2) determining an input feature set of a predictor, and analyzing feature importance of different users under the feature set

The mathematical model for the significance analysis of RReliefF is:

N_dL＝N_dL+diff(L,R_i,I_j)·d(i,j)

N_dF(F)＝N_dF(F)+diff(F,R_i,I_j)·d(i,j)

N_dL·_dF(F)＝N_dL·_dF(F)+diff(L,R_i,I_j)·diff(F,R_i,I_j)·d(i,j)

in the formula, R_i(i 1.. m.) is a randomly drawn sample, and m is an artificially set randomly drawn sample R_iNumber of times of (1)_j(j ═ 1.. k.) is R_iK number of neighboring samples, k number of iterations, N_dLLoad values L weight, N for different samples_dF(F) Is the weight of the predicted feature F, N_dL·_dF(F) Adding a characteristic F weight to the load value L, diff (L, R)_i,I_j) And diff (F, R)_i,I_j) Calculated separately is the sample R_iAnd I_jThe difference in load value L and feature F; d (i, j) is the calculated sample R_iAnd I_jOn the basis of the distance between the two, cyclically extracting R times_iCalculating the weight of each feature, namely the importance of the feature;

3) describing user differences by using the feature importance set, carrying out SDKM clustering on the users, classifying the users with similar response degrees of input features into a class, and determining the optimal clustering result of different distribution network total load fluctuation degree periods in the day by adopting a statistical experiment;

firstly, a clustering algorithm is used for extracting similarity and difference among data by analyzing and mining the whole data set, and a clustering mathematical model is as follows:

wherein X is { X ═ X_q}，q＝1,2,...,Q,x_qRepresenting Q objects in the dataset that need to be clustered, c_kFor datasets in class K, there are a total of classes K,

represents c_kJ is the sum of the squared errors of all classes;

initializing K clustering centers by a K-Means algorithm; then, the Euclidean distance Euc (x) from each sample in the set to the K cluster centers is calculated_q,v_k) And dividing the sample into the class with the minimum distance index, wherein the Euclidean distance formula comprises the following steps:

in the formula, v_kIs c_kEuc (x)_q,v_k) For each sample to K cluster centers v_kThe Euclidean distance of (c);

③ mathematical model of S _ Dbw:

S_Dbw(k)＝Scat(k)+Dens_bw(k)

wherein Scat (k) is the mean dispersion value of the kth cluster, and Dens _ bw (k) is the intra-cluster density of the kth cluster;

and fourthly, optimizing the initial point of the clustering center by using a crow algorithm, wherein M crows can move in the dimension of a decision variable of the problem to be solved for searching a better food position, so that the dimension of the decision variable is the dimension of the initial point of the clustering center, namely the clustering number k. The position of each crow and the memory matrix LOC, MEM:

in the formula, the position of the i (i ═ 1, 2.., M) of only the crow in the M (M ═ 1, 2.., MCN) iteration is l^i,mRepresentative of the fact that,

and each crow stores the position of the hidden food in the memory vector me^i,mPerforming the following steps;

in the mth iteration, crow j returns to food location me^j,mIn the meantime, crowi follows crow j and finds the position, at this moment, the probability that crow j finds and changes the food place is P, and the position update of crow i is:

where Fitness () represents the Fitness function, λ_iAnd λ_jIs [0,1 ]]Obeying uniformly distributed random numbers, fl is a flight distance, if the fitness function value of the new position is superior to the original position value, the position can be updated, otherwise, the position is not updated;

4) after clustering, determining the optimal clustering result of different distribution network total load fluctuation degree periods in the day by adopting a statistical experiment

After determining that the new clustering method is feasible, taking a final prediction result MAPE (mean absolute percentage error, MAPE) of the predictor as an evaluation index of the optimal clustering number, wherein the MAPE is as follows:

in the formula, n_tIs the number (n) of predicted values_t＝1,2,…,N_t)；L_rIs the real load value; l is_pThen it is the predicted load value;

5) selecting a random forest predictor based on ensemble learning, and respectively constructing rolling prediction models for optimal clustering results in different fluctuation degree periods

The random forest prediction model is as follows:

{h(x,Θ_d),d＝1,2,...,D}

in the formula, h (x, theta)_d) Representing the d decision tree theta forming the random forest, x is the input vector of the decision tree, each theta is independently distributed and represents the sample data and decision of the d tree in the random forestA random process of tree growth;

when prediction is carried out, a final prediction result y can be obtained according to the output of all decision trees in the model_p，

Wherein D represents the number of trees in the RF; y is_pdIs the predicted result of the d-th tree.

According to the method for predicting the load of the power distribution network based on the intelligent electric meter user cluster analysis, by analyzing the load fluctuation of the intelligent electric meter user, 24 hours a day are divided into 3 time periods with different fluctuation degrees according to the fluctuation degrees; determining an input feature set of a predictor, and analyzing feature importance of different users under the feature set; describing user differences by using the feature importance set, carrying out SDKM clustering on the users, classifying the users with similar response degrees of input features into a class, and determining the optimal clustering result of different distribution network total load fluctuation degree periods in the day by adopting a statistical experiment; and selecting a random forest predictor based on ensemble learning, and respectively constructing a rolling prediction model aiming at the optimal clustering results in different fluctuation degree periods. On one hand, the defect that the influence of non-load characteristics on clustering is difficult to analyze in the traditional clustering method is overcome, and the influence of randomly selecting the initial clustering center point on the clustering effect is reduced by optimizing the selection of the initial clustering center point; on the other hand, the power utilization fluctuation difference of users in different time intervals is considered, 24 hours are divided into time intervals with different fluctuation, clustering and targeted modeling are carried out, the overall prediction precision is remarkably improved, the random forest predictor is not affected by dimension disasters, and the rolling prediction model error can be effectively reduced by expanding the characteristic dimension of the historical load. Has the advantages of scientific and reasonable structure, strong applicability, good effect and the like.

Drawings

FIG. 1 is a flowchart of a power distribution network load prediction method based on smart meter user cluster analysis according to an embodiment;

fig. 2 is a diagram of fluctuation and data box of a total load of a smart meter user in the power distribution network load prediction method based on the smart meter user cluster analysis according to the embodiment;

FIG. 3 is a statistical chart of optimal cluster numbers of each time period in the power distribution network load prediction method based on the smart meter user cluster analysis according to the embodiment;

fig. 4 is a box diagram of load prediction errors in a power distribution network load prediction method based on smart meter user cluster analysis in working days and non-working days according to the embodiment.

Detailed Description

The invention is described more fully hereinafter with reference to the accompanying drawings and examples.

Referring to fig. 1, the method for predicting the load of the power distribution network based on the user cluster analysis of the smart meters, disclosed by the invention, comprises the following steps:

step S101, analyzing the load fluctuation of the intelligent electric meter user, and dividing 24 hours a day into 3 periods with different fluctuation degrees according to the fluctuation degree;

step S102, determining a predictor input feature set, and analyzing feature importance of different users under the feature set;

step S103, describing user differences by using the feature importance set, clustering the users by SDKM, classifying the users with similar response degrees of input features into a class, and determining the optimal clustering result of the total load fluctuation degree periods of different distribution networks in the day by adopting a statistical experiment;

and step S104, selecting a random forest predictor based on ensemble learning, and respectively constructing rolling prediction models aiming at the optimal clustering results in different fluctuation degree periods.

According to the power distribution network load prediction method based on the intelligent electric meter user cluster analysis in the exemplary embodiment of the invention, through analyzing the load fluctuation of the intelligent electric meter user, 24 hours a day is divided into 3 time periods with different fluctuation degrees according to the fluctuation degree; determining an input feature set of a predictor, and analyzing feature importance of different users under the feature set; describing user differences by using the feature importance set, carrying out SDKM clustering on the users, classifying the users with similar response degrees of input features into a class, and determining the optimal clustering result of different distribution network total load fluctuation degree periods in the day by adopting a statistical experiment; and selecting a random forest predictor based on ensemble learning, and respectively constructing a rolling prediction model aiming at the optimal clustering results in different fluctuation degree periods. On one hand, the defect that the influence of non-load characteristics on clustering is difficult to analyze in the traditional clustering method is overcome, and the influence of randomly selecting the initial clustering center point on the clustering effect is reduced by optimizing the selection of the initial clustering center point; on the other hand, the power utilization fluctuation difference of users in different time intervals is considered, 24 hours are divided into time intervals with different fluctuation, clustering and targeted modeling are carried out, the overall prediction precision is remarkably improved, the random forest predictor is not affected by dimension disasters, and the rolling prediction model error can be effectively reduced by expanding the characteristic dimension of the historical load.

In step S101, the load fluctuation of the user of the smart meter is analyzed, and 24 hours a day is divided into 3 periods with different fluctuation degrees according to the fluctuation degree.

Under different time periods, the overall load fluctuation of the power distribution network is different; and under different volatility, the optimal clustering number of the smart meter users also needs to be analyzed in a targeted manner. The section analyzes load fluctuation at different time intervals aiming at the data set of the resident intelligent electric meter. As shown in fig. 2, the total electricity consumption of the users in the area to be predicted in 365 days of the year can be divided into 3 periods according to the volatility of the total electricity consumption.

wherein N represents the number of SMs (N is 1,2, …, N); t stands for each time (t ═ 1,2, …,48), L_n(t) is the load value of the nth SM at time t.

In step S102, a set of predictor input features is determined, and feature importance of different users under the set of features is analyzed:

the feature importance may reflect the degree of correlation between the feature and the predicted objective. And the user characteristic importance sets of different intelligent electric meters are different, so that different user loads are reflected to different response degrees of the input characteristics of the predictor. Therefore, clustering can be carried out according to the user feature importance set so as to classify users with similar responses to features into one class and carry out targeted modeling. Meanwhile, clustering analysis is carried out by adopting the feature importance set, the method is not limited by the feature dimension, and the relation between the multi-data type features and the prediction object can be analyzed.

The mathematical model for the significance analysis of RReliefF is:

N_dL＝N_dL+diff(L,R_i,I_j)·d(i,j)

N_dF(F)＝N_dF(F)+diff(F,R_i,I_j)·d(i,j)

N_dL·_dF(F)＝N_dL·_dF(F)+diff(L,R_i,I_j)·diff(F,R_i,I_j)·d(i,j)

in the formula, R_i(i 1.. m.) is a randomly drawn sample, and m is an artificially set randomly drawn sample R_iNumber of times of (1)_j(j ═ 1.. k.) is R_iK number of neighboring samples, k number of iterations, N_dLLoad values L weight, N for different samples_dF(F) Is the weight of the predicted feature F, N_dL·dF(F) Adding a characteristic F weight to the load value L, diff (L, R)_i,I_j) And diff (F, R)_i,I_j) Calculated separately is the sample R_iAnd I_jThe difference in load value L and feature F; d (i, j) is the calculated sample R_iAnd I_jOn the basis of the distance between the two, cyclically extracting R times_iThe weight of each feature, i.e., the importance of the feature, may be calculated.

In step S103, describing user differences by the feature importance sets, performing SDCKM clustering on the users, classifying users with similar input features and response degrees into a class, and determining the optimal clustering result of different distribution network total load fluctuation degree periods in the day by using a statistical experiment:

the clustering algorithm extracts the similarity and difference between data by analyzing and mining the whole data set. The K-means algorithm partitions the data set X to minimize the sum of the squared errors of all classes J:

in the formula c_kFor a data set in the k-th class,

represents c_kIs X ═ X, data set_q}，q＝1,2,...,Q,x_qRepresenting the Q objects in the dataset that need to be clustered.

Initializing K clustering centers by a K-Means algorithm; then, calculating Euclidean distances from each sample in the set to K clustering centers, and dividing the samples into the class with the minimum distance index; then, the average value of each class is recalculated, and this average value is taken as a new cluster center. And repeating the steps until the maximum iteration number is reached or J converges. Euclidean distance formula:

in the formula v_kIs c_kThe cluster center of (2).

In the exemplary embodiment of the invention, the clustering result in the process of optimizing the initial central point of clustering is evaluated by an S-Dbw index. After the clustering is completed, the clustering result is still generally evaluated by using the Euclidean distance as an index. Clustering results should ensure the minimum intra-cluster distance and the maximum inter-cluster distance as much as possible. But the euclidean distance only analyzes intra-cluster similarity and ignores the discreteness between clusters. In order to improve the defect, the S-Dbw distance is introduced as a clustering judgment index. S _ Dbw is determined by calculating the mean dispersion value for each cluster (summed with the intra-cluster density:

S_Dbw(k)＝Scat(k)+Dens_bw(k)

wherein Scat (k) is an average dispersion value,

s represents the data set, σ(s)_k) And σ(s) represent the standard deviation of the data in the kth cluster and the standard deviation of the overall data s, respectively; dens _ bw (k) is the intra-cluster density, and the formula is as follows.

Where dens () represents the average density function of the inter-cluster region, v_kAnd v_k’Respectively representing the clustering center points of the kth cluster and the kth' cluster; u. of_kk’Represents the middle point of the connecting line of the two cluster center points k and k'. The smaller the value of the S _ Dbw distance is, the better the clustering effect is. For the intra-cluster evaluation index, S-Dbw is the only index which has good performance in monotonicity, noise, density and inter-cluster distance.

In an exemplary embodiment of the invention, the initial cluster center selection is optimized using a crow's foot algorithm. In order to avoid the influence of randomly selecting the initial clustering center on the clustering result, the exemplary embodiment of the invention adopts a new artificial intelligence algorithm, namely a crow algorithm, to solve the optimal initial clustering center.

And optimizing the initial point of the clustering center by using a crow algorithm, wherein M crows can move in the dimension of a decision variable of a problem to be solved for searching a better food position, so that the dimension of the decision variable is the dimension of the initial point of the clustering center, namely the clustering number k. The position of each crow and the memory matrix LOC, MEM:

in the formula, the ith (i ═ 1,2,..., M) only crow in the M (M1, 2.., MCN) iterations with l^i,mRepresentative of the fact that,

and each crow stores the position of the hidden food in the memory vector me^i,mIn (1).

In the mth iteration, crow j returns to food location me^j,mMeanwhile, the crow i follows the crow j and finds the position, and at the moment, the probability that the crow j finds and changes the food place is P. The location update of crow i is as follows:

where Fitness () represents the Fitness function, λ_iAnd λ_jIs [0,1 ]]Obeying to uniformly distributed random numbers, fl is the flight distance. If the value of the fitness function of the new position is better than that of the original position, the position can be updated if the scheme is available, otherwise, the position is not updated.

In the exemplary embodiment of the invention, the global searching capability of the crow algorithm is combined with the local searching capability of the K-Means algorithm, the clustering quality of each time is evaluated by taking the S-Dbw distance as a fitness function and comprehensively considering the intra-cluster density and the inter-cluster average scattering degree through changing the position and memorizing of the crow, and finally the optimal clustering initial center is obtained. The SDCKM clustering procedure is as follows:

1) and initializing parameters. The scale of crow population is M; the location LOC of crow and the memory MEM; deciding variable dimensions, namely k initial clustering centers; maximum number of iterations MCN; a flight distance fl; probability of consciousness P.

2) And substituting the initial clustering center point represented by each crow memory into a K-Means algorithm to obtain a clustering result based on the initial clustering center points.

3) And calculating the Fitness value of the Fitness function. And calculating the fitness function of the clustering result based on the step 2).

Wherein, S _ Dbw (k) represents the S-Dbw index of the kth cluster.

4) And carrying out position updating.

5) And calculating a fitness function according to the updated position of each crow. And comparing the fitness functions, and keeping the position vector with a small fitness function value to update the memory.

6) And repeating the steps 2), 3) and 4) until the cycle number MCN is reached, and selecting the memory position with the minimum fitness value as the optimal clustering initial center.

7) After the operation of the steps is finished, the clustering initial center obtained after the optimization in the step 5) is used as the initial clustering center of the K-Means algorithm, and a final clustering scheme is generated.

In the exemplary embodiment of the invention, after the new clustering method is determined to be feasible, the final prediction result MAPE of the predictor is taken as the evaluation index of the optimal clustering number, and the optimal clustering number is determined. Introducing a random forest prediction model, and performing targeted modeling by taking the total load of each type of clustered users as a prediction target; and then summarizing all kinds of user prediction results to obtain a total load prediction conclusion of the residential area power distribution network. The smaller the MAPE, the better the prediction effect under the clustering number. MAPE is:

in the formula, n_tIs the number (n) of predicted values_t＝1,2,…,N_t)，L_rIs the true load value, L_pThen MAPE is used to measure the prediction accuracy of the predictor as a whole for predicting the load value. As shown in fig. 3, on the basis of the SDCKM clustering method and the time domain characteristics of fluctuation of the load in the day, statistical analysis is performed for different time periods to determine the optimal clustering number in each time period. In the experiment, the clustering numbers were differentAnd determining the optimal clustering number in different time periods according to the highest prediction precision of the random forest load predictor.

In step S104, a random forest predictor based on ensemble learning is selected, and a rolling prediction model is respectively constructed for the optimal clustering results in different fluctuation degree periods:

the random forest prediction model is as follows:

{h(x,Θ_d),d＝1,2,...,D}

in the formula, h (x, theta)_d) Representing the d decision tree forming the random forest, x is an input vector of the decision tree, each theta is independently distributed and represents the random process of extracting the sample data of the d tree in the random forest and growing the decision tree.

When prediction is carried out, a final prediction result y can be obtained according to the output of all decision trees in the model_p。

In an exemplary embodiment of the invention, MAPE and Root Mean Square Error (RMSE) are used, where RMSE is:

in the formula, n_tIs the number (n) of predicted values_t＝1,2,…,N_t)，L_rIs the true load value, L_pThe predicted load value is obtained.

In the exemplary embodiment of the present invention, in consideration of the influence of the difference between the residential electricity usage patterns on the load prediction on the weekday and the non-weekday, the load prediction results of the two date types are randomly extracted and compared, as shown in fig. 4.

Furthermore, the above-described drawings are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present invention, are not limited to the precise structures that have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the invention is only limited by the appended claims.

Claims

1. A power distribution network load prediction method based on intelligent electric meter user cluster analysis is characterized by comprising the following steps:

The mathematical model for the significance analysis of RReliefF is:

N_dL＝N_dL+diff(L,R_i,I_j)·d(i,j)

N_dF(F)＝N_dF(F)+diff(F,R_i,I_j)·d(i,j)

N_dL·_dF(F)＝N_dL·_dF(F)+diff(L,R_i,I_j)·diff(F,R_i,I_j)·d(i,j)

represents c_kJ is the sum of the squared errors of all classes;

③ mathematical model of S _ Dbw:

S_Dbw(k)＝Scat(k)+Dens_bw(k)

optimizing the initial point of the clustering center by using a crow algorithm, wherein M crows can move in the dimension of a decision variable of a problem to be solved for searching a better food position, so that the dimension of the decision variable is the dimension of the initial point of the clustering center, namely the clustering number k; the position of each crow and the memory matrix LOC, MEM:

in the mth iteration, crow j returns to food location me^j,mWhen the food is placed in the food storage space, the crow i follows the crow j and finds the position, at the moment, the probability that the crow j finds and replaces the food place is P, and the position of the crow i is updated as follows:

The random forest prediction model is as follows:

{h(x,Θ_d),d＝1,2,...,D}

in the formula, h (x, theta)_d) Representing the d decision tree theta forming the random forest, wherein x is an input vector of the decision tree, each theta is independently distributed and represents a random process of extracting the sample data of the d decision tree in the random forest and the growth of the decision tree;