CN110825723A

CN110825723A - Residential user classification method based on power load analysis

Info

Publication number: CN110825723A
Application number: CN201910952518.8A
Authority: CN
Inventors: 夏飞; 张洁
Original assignee: Shanghai University of Electric Power
Current assignee: Shanghai University of Electric Power
Priority date: 2019-10-09
Filing date: 2019-10-09
Publication date: 2020-02-21
Anticipated expiration: 2039-10-09
Also published as: CN110825723B

Abstract

The invention provides a resident user classification method based on power load analysis, which comprises the steps of firstly carrying out data preprocessing on daily power load data to obtain a plurality of sample data, then carrying out pre-clustering and agglomerative clustering on the sample data to obtain a plurality of data clusters, then comparing and judging the profile average value of the plurality of data clusters with a preset profile threshold value, then carrying out repeated pre-clustering and agglomerative clustering on the sample data according to the judgment result, the times of the agglomerative clustering and the number of the sample data in the data clusters, and finally classifying the resident users according to the data clusters.

Description

Residential user classification method based on power load analysis

Technical Field

The invention belongs to the field of power supply, and particularly relates to a residential user classification method based on power load analysis.

Background

The electrical load of the residential users is gradually becoming a major component of the peak load in the power grid system, and new challenges are brought to the safe and stable operation of the power grid system. Therefore, how to realize the supply side management of the power grid system aiming at the electricity load characteristics of various residential users is the key of safe and stable operation of the power grid system in the future.

Aiming at the cluster analysis of the electricity load characteristics of various residential users, a plurality of scholars have made relevant research. Chua Heng, Wuhui Cheng, Zhou and the like in ' investigation and load analysis of electricity utilization in certain residential quarter ' (Jiangxi electric power, 2017,41(2):24-27 '), give power utilization curves of four seasons of a user, and power utilization curves of holidays, workdays and weekends of the user by analyzing data of a user intelligent electric meter in certain residential quarter in Nanchang city, analyze power utilization behaviors of the user, and provide a basis for innovative services of power utilization customers, power supply enterprises and social environments. Liufei, cardia jun, etc. in "typical load characteristic analysis of residents based on cluster analysis" (Jiangsu motor engineering, 2007,12(26):34-37), the electricity data is analyzed by K-means clustering, typical electricity load representative curves in different seasons are obtained, and some relations between the load characteristics of residents and various influence factors are obtained through research. Dingqi, Wang Guang and so on cluster users in a typical substation area in the clustering analysis application of regional power user load patterns (electromechanical engineering, 2008,25(9):31-33,84), and carry out analogy with the traditional national economy industry classification, and the method also provides reference basis for power supply departments in the aspects of power load management, substation planning, state estimation and so on. The fuzzy C-means clustering algorithm is utilized to perform clustering analysis on the resident load curve in 'resident user electricity decision model and information system research of supply and demand interaction' (academic paper: North China electric university.2017), so that different electricity utilization characteristics of residents are obtained, the resident electricity utilization optimization space is explored, the users are guided to reasonably use electricity, the electricity utilization structure is optimized, and the effects of peak clipping and valley filling are achieved. Grandson and Yiwei, Li Bin and so on propose a user hierarchical clustering method based on differentiation feature extraction in a user hierarchical clustering and package recommending method facing to reform of electricity selling side (power grid technology, 2018,42(2): 447-; and in the layer 2, differentiated power utilization characteristics are extracted for various users obtained in the layer 1, and the users are classified again by respectively applying a proper clustering algorithm. And finally, recommending a proper electricity price package for the sub-class users after the two-layer clustering.

However, the power consumption load of the residential users has the characteristic of large data volume of power consumption information, and the power consumption modes of the residential users of different types also have great difference, but the method has insufficient refinement degree of the analysis of the power consumption load characteristics of various residential users, so that the residential users cannot be accurately classified according to the power consumption modes of various residential users, and therefore, a power supply unit cannot determine the power consumption load characteristics of various residential users according to the types of the residential users, so that the supply side management of a power grid system is accurately performed, and the safe and stable operation of the power grid system is ensured.

Disclosure of Invention

The effective analysis of the electricity load characteristics of various residential users is the basis for implementing the management measures of the supply side of the power grid system. Through the analysis of the power load characteristics of various residential users, the evaluation of the power load composition and the power consumption mode in one region is facilitated, the power consumption management system is also an important research work for reasonably arranging the power consumption layout and effectively utilizing the power energy resources, and the safe and stable operation of a power grid system can be guaranteed.

The invention aims to provide a resident user classification method obtained according to a daily electric load curve of a resident user, so that the electric load characteristics of various resident users can be determined according to the types of the resident users, the supply side management of a power grid system is further accurately carried out, and the safe and stable operation of the power grid system is ensured.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a resident user classification method based on power load analysis, which is characterized by comprising the following steps:

step S1: carrying out data preprocessing on a plurality of daily electric load data to obtain a plurality of sample data;

step S2: pre-clustering sample data to obtain a plurality of data sub-clusters;

step S3: performing clustering on the data sub-clusters based on a Bayesian criterion to obtain a plurality of data clusters;

step S4: analyzing and calculating the data clusters to obtain the contour average value of the agglomeration clusters;

step S5: judging whether the profile average value is greater than or equal to a preset profile threshold value, if so, going to step S10, and if not, going to step S6;

step S6: judging whether the times of the clustering is less than or equal to the preset clustering times, and if so, entering the step S7;

step S7: judging whether the number of sample data in each data cluster is less than or equal to the preset sample number, if not, taking the data cluster as an intermediate data cluster, entering a step S8, and if so, taking the data cluster as a determined data cluster, and entering a step S9;

step S8: repeating the steps S2-S3 according to the sample data in the intermediate data cluster to obtain a data cluster to be determined;

step S9: integrating the data clusters to be determined and the determined data clusters to obtain new data clusters, and then entering step S4;

step S10: and classifying the residential users according to the data clustering.

The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein the number of data clusters is M, the preset clustering frequency is M, and the contour threshold value is

The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: the data preprocessing in step S1 includes the following sub-steps:

step S1-1: carrying out data cleaning on a plurality of daily electric load data by adopting a Newton interpolation method to obtain a plurality of initial data;

step S1-2: and respectively carrying out data normalization processing on the plurality of initial data to obtain a plurality of corresponding sample data.

The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein, step S2 includes the following substeps:

step S2-1: reading sample data one by one based on a BIRCH algorithm;

step S2-2: and pre-clustering a plurality of sample data in the dense area according to the reading result so as to obtain the data sub-cluster.

The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein, the expression of the bayesian criterion in step S3 is:

BIC＝-2ln(L)+ln(h)·Y，

BIC is the classification evaluation of the data clusters, the higher the BIC is, the more reasonable the classification of the data clusters is represented, L is a maximum likelihood function value, h is the number of data sub-clusters, and Y is the number of sample data contained in all the data sub-clusters.

The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein the number of sample data is n, n is a positive integer greater than or equal to 2,

step S4 includes the following substeps:

step S4-1: obtaining intra-cluster dissimilarity a (i) of n sample data according to the correspondence of the n sample data, wherein the expression of the intra-cluster dissimilarity a (i) is as follows:

i, i 'is two sample data in the same data cluster, dist (i, i') is the Euclidean distance between the two sample data i, i | C_sL is the number of sample data contained in the data cluster s to which the sample data i belongs;

step S4-2: obtaining cluster dissimilarity b (i) of n sample data according to n sample data, wherein the expression of the cluster dissimilarity b (i) is as follows:

i, i ' are two sample data in different data clusters, dist (i, i ') is the Euclidean distance between the two sample data i, i ',|C_tl is the number of sample data contained in the data cluster t to which the sample data i' belongs;

step S4-3: obtaining a contour average value T according to the intra-cluster dissimilarity a (i) of the sample data and the inter-cluster dissimilarity b (i) of the sample data, wherein the expression of the contour average value T is as follows:

s (i) is a contour coefficient expressed as:

the method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: in step S10, the step of classifying the residential users is to obtain the residential user classifications corresponding to the data clusters according to the predetermined electricity utilization characteristic indexes.

The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: the resident user classification comprises office workers, children comprehensive households, office workers, old people comprehensive households, old people family households, single-person office workers and comprehensive multi-person family households.

The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: the preset electricity utilization characteristic indexes comprise an overall load level, an electricity utilization peak time period, a late load descending time point and a daily load fluctuation rate, and the expression of a load level value of the overall load level is as follows:

P_levelrepresenting a load level value; p_averageThe daily average load is; p_maxThe peak time of the electricity consumption comprises 5 to 6 points, 11 to 12 points and 19 to 20 points for the maximum value of all the daily electricity consumption load dataAnd 20 to 21 points, wherein the late load reduction time point comprises 21 and 22 points, and the daily load fluctuation rate is expressed as:

P_waverepresenting daily load fluctuation rate; p_errorStandard deviation of daily load.

The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein the overall load level includes a high load level, a higher load level, a medium load level, a low load level, an extremely low load level, and a load level value P of the high load level_levelGreater than or equal to 50%, the higher load level being the load level value P_levelLess than 50% and not less than 40%, and the medium load level is the load level value P_levelLess than 40% and not less than 20%, the low load level being the load level value P_levelLess than 20% and not less than 10%, and the extremely low load level is the load level value P_levelLess than 10%.

Action and Effect of the invention

According to the resident user classification method based on the power load analysis, the method comprises the steps of firstly preprocessing daily power load data to obtain a plurality of sample data, then carrying out pre-clustering and clustering on the sample data to obtain a plurality of data clusters, then comparing and judging the profile average value of the plurality of data clusters with the preset profile threshold value, then carrying out repeated pre-clustering and clustering on the sample data according to the judgment result, the clustering frequency and the number of the sample data in the data clusters, and finally classifying the resident users according to the data clusters, so that the resident user classification method based on the power load analysis forms the plurality of data clusters by carrying out multi-clustering on the sample data based on the Bayesian criterion, thereby rapidly carrying out the optimal division of the plurality of data clusters, and then classify the residential users according to the data clustering, compare the residential user classification method in the past, it is more meticulous to the analysis of residential power load, improve the accuracy to the classification of residential user greatly to let the power supply unit can confirm the power load characteristic of all kinds of residential users according to the residential user's kind, and then accurately carry out the supply side management of electric wire netting system, guarantee electric wire netting system safety and stability operation.

Drawings

Fig. 1 is a schematic step diagram of a residential user classification method based on electric load analysis in an embodiment of the present invention;

FIG. 2 is a sample data curve in an embodiment of the present invention;

fig. 3(a) is a data clustering result one formed after the first pre-clustering of the daily electricity load data in the embodiment of the present invention;

fig. 3(b) is a second data clustering result formed after the first pre-clustering of the daily electricity load data in the embodiment of the present invention;

fig. 4(a) is a first data clustering result formed after the second pre-clustering of the daily electricity load data in the embodiment of the present invention;

fig. 4(b) is a second data clustering result formed after the second pre-clustering of the daily electricity load data in the embodiment of the present invention;

fig. 5(a) is a data clustering result one formed after third pre-clustering of daily electricity load data in the embodiment of the present invention;

fig. 5(b) is a second data clustering result formed after third pre-clustering of daily electricity load data in the embodiment of the present invention;

fig. 5(c) is a data clustering result three formed after third pre-clustering of daily electricity load data in the embodiment of the present invention;

fig. 5(d) is a data clustering result four formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; and

fig. 6 is a data clustering result of daily electricity load data after pre-clustering.

Detailed Description

In order to make the technical means, creation features, achievement purposes and effects of the present invention easy to understand, the following embodiments specifically describe the residential user classification method based on the electric load analysis in conjunction with the accompanying drawings.

Fig. 1 is a schematic step diagram of a residential user classification method based on electric load analysis in an embodiment of the present invention.

As shown in fig. 1, the residential user classification method based on electric load analysis in the present embodiment is used for classifying residential users according to a plurality of daily electric load data, and comprises the following steps:

step S1: performing data preprocessing on the plurality of daily electric load data to acquire a plurality of sample data, wherein the data preprocessing in the step S1 comprises the following substeps:

step S1-1: and (4) carrying out data cleaning on the plurality of daily electric load data by adopting a Newton interpolation method to obtain a plurality of initial data.

In the actual sampling process, due to hardware factors, a part of daily electric load data is lost in the sampling process, so that data cleaning needs to be performed on the lost daily electric load data by using a newton interpolation method so that the number of initial data is consistent with the number of daily electric load data.

The specific process is as follows: and (3) carrying out data cleaning on a plurality of daily electric load data, and filling missing data by mainly adopting a Newton interpolation method to obtain a plurality of initial data.

The interpolation polynomial of the Newton interpolation method is shown in formula (1):

wherein n is the number of daily electricity load data, n is a positive integer greater than or equal to 2, and f (x)_i) Missing daily electrical load data for Newton interpolation, (x)₁,f(x₁)),(x₂,f(x₂)),…,(x_n,f(x_n) Is a sequence of n daily electricity load data, (x)_i,f(x_i) X belongs to R, i belongs to [1, n ] as missing daily electricity load data]，P(x_i) Approximating the function for Newton's interpolation, R (x)_i) Is an error function.

The expression of the newton's interpolation approximation function is shown in equation (2):

the expression of the error function is shown in equation (3):

R(x_i)＝(x_i-x₁)(x_i-x₂)…(x_i-x_n)f[x_n,x_n-1,…,x₁,x_i](3)

in this embodiment, the daily electrical load data is daily electrical load data of a plurality of districts for one year, each district corresponds to a power supply area of one transformer in the power grid system, the sampling number of the daily electrical load data is 96 (24 hours, one point is acquired every 15 minutes), and the number of the initial data is also 96.

Step S1-2: respectively carrying out data normalization processing on the plurality of initial data to obtain a plurality of corresponding sample data, specifically, converting the value of the initial data to [0,1] by adopting a linear normalization method to obtain the corresponding sample data.

The normalized formula is shown in formula (4):

p'_i＝(p_i-min(p))/(max(p)-min(p)) (4)

wherein i is ∈ [1, n ].

Wherein p is initial data, p_iIs normalized data, i.e. sample data.

Fig. 2 is a sample data curve in an embodiment of the present invention.

In this embodiment, the plurality of initial data are originated from a plurality of cells, and the capacities of the initial data representations originated from different cells are different, and although the dimensions of the initial data are the same, the sizes of the initial data originated from different cells are different greatly. Therefore, all initial data needs to be normalized, namely, dimensional initial data is converted into dimensionless initial data through transformation, namely, scalar sample data is obtained, and therefore the accuracy of the subsequent clustering result is guaranteed.

As shown in fig. 2, normalization processing is performed on a plurality of initial data by using equation (4), a plurality of sample data are obtained, and a sample data curve is drawn according to the plurality of sample data. In practical applications, the maximum load (L) obtained by different cells_max) Minimum load (L)_min) Is different, which may cause the normalized result to be unstable, thereby affecting the subsequent result. Therefore, L is replaced by an empirical value according to the actual situation of the load of each cell_max、L_minHere, take L_max＝500,L _min0, this avoids L for different sample sets_max、L_minDifferent resulting model bias.

Step S2: pre-clustering sample data to obtain a plurality of data sub-clusters, wherein step S2 includes the following sub-steps:

step S2-1: reading sample data one by one based on a BIRCH algorithm, specifically reading concentrated data points of a plurality of sample data one by adopting a thought of CF (Clustering feature) tree growth in a BIRCH (balanced iterative reduction and Clustering by using a hierarchical structure) algorithm.

Step S2-2: and pre-clustering a plurality of sample data in the dense area according to the reading result so as to obtain a data sub-cluster, specifically, pre-clustering the sample data in the dense area while generating the CF tree so as to form a plurality of data sub-clusters.

Step S3: performing clustering on the data sub-clusters based on a Bayesian criterion to obtain a plurality of data clusters, wherein the number of the data clusters is m, and the expression of the Bayesian criterion is shown in formula (5):

BIC＝-2ln(L)+ln(h)·Y (5)

BIC is data cluster evaluation, the higher BIC represents the more reasonable division of data clusters, L is a maximum likelihood function value, h is the number of data sub-clusters, Y is the number of sample data contained in all the data sub-clusters,

the method specifically comprises the following steps:

and taking the data sub-clusters which are the results of the pre-clustering stage as objects, combining the data sub-clusters one by utilizing an aggregation method (namely repeatedly combining the two latest data sub-clusters to form a new data sub-cluster) until the data sub-clusters are combined to the expected number of the data sub-clusters, and taking the data sub-clusters at the moment as data clusters.

The pre-clustering-agglomerative clustering is a two-step clustering, and when the two-step clustering is adopted, the clustering Criterion is based on a Bayesian Criterion, namely Bayesian Information Criterion (BIC).

Step S4: analyzing and calculating the data clusters to obtain the profile average values of the data clusters, wherein the step S4 comprises the following substeps:

step S4-1: obtaining intra-cluster dissimilarity a (i) of n sample data according to the n sample data, wherein the expression of the intra-cluster dissimilarity a (i) is formula (9):

step S4-2: obtaining inter-cluster dissimilarity b (i) of n sample data according to the correspondence of the n sample data, wherein an expression of the inter-cluster dissimilarity b (i) is shown as formula (10):

i, i 'are two sample data in different data clusters, dist (i, i') is the Euclidean distance between the two sample data i, i | C_tL is the number of sample data contained in the data cluster t to which the sample data i' belongs;

step S4-3: obtaining a contour average value T according to the intra-cluster dissimilarity a (i) of the sample data and the inter-cluster dissimilarity b (i) of the sample data, wherein an expression of the contour average value T is shown in formula (11):

s (i) is a contour coefficient, and the expression is shown in formula (12):

step S5: judging whether the profile average value is greater than or equal to a preset profile threshold value

The predetermined clustering frequency is M, namely, the profile average value is judged

If the determination result is yes, the clustering is completed, and the process proceeds to step S10, and if the determination result is no, the process proceeds to the next step, i.e., step S6.

In the present embodiment, the predetermined clustering number M is equal to 3.

Step S6: and judging whether the times of the clustering is less than or equal to the preset clustering times, if so, entering the next step, namely step S7, and ending clustering to obtain a clustering result.

the process of step S7 specifically includes:

setting the clustering result as the existence of Q types of data clusters, and respectively clustering the number w of sample data in each type of data Q_q(q＝[1,Q]Q ∈ N), if w_qLess than or equal to the predetermined number of samples, there is no need to continue clustering the class and do the data cluster qClustering q for determined data₁Step S9 is entered, otherwise, the data cluster q is used as the intermediate data cluster q₂The process proceeds to step S8.

In the present embodiment, the predetermined number of samples is 2.

the process of step S8 specifically includes:

clustering all intermediate data q₂Repeating the steps S2-S3 once to obtain the data classification q to be determined₃。

the process of step S9 specifically includes:

classifying all data to be determined into a class q₃And determined data cluster q₁And integrating to obtain new data clusters.

Fig. 3(a) is a data clustering result one formed after the first pre-clustering of the daily electricity load data in the embodiment of the present invention; fig. 3(b) is a second data clustering result formed after the first pre-clustering of the daily electricity load data in the embodiment of the present invention; fig. 4(a) is a first data clustering result formed after the second pre-clustering of the daily electricity load data in the embodiment of the present invention; fig. 4(b) is a second data clustering result formed after the second pre-clustering of the daily electricity load data in the embodiment of the present invention; fig. 5(a) is a data clustering result one formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; fig. 5(b) is a second data clustering result formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; fig. 5(c) is a data clustering result three formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; fig. 5(d) is a data clustering result four formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; fig. 6 is a data clustering result of daily electricity load data after pre-clustering.

As shown in fig. 3(a) -6, the specific process of cluster analysis is as follows:

performing first multidimensional clustering on the sample data, namely performing first pre-clustering, and separating 2 data clusters, wherein the first multidimensional clustering result is shown in fig. 3(a) -3(b), wherein the abscissa of the curve is a time point in one day, and the ordinate is the power consumption load (KW).

After the first multidimensional clustering, the contour average value T is 0.8, and the condition that T is greater than or equal to (1-m/10) is not satisfied, so that the second multidimensional clustering is performed on the first multidimensional clustering result, that is, the second pre-clustering is performed, because only the number of samples in the data clustering 2 is greater than 2, only the second multidimensional clustering is performed on the data clustering 2, and 2 data clusters are separated, and the second multidimensional clustering result is shown in fig. 4(a) -4(b), wherein the abscissa of the curve is a time point in one day, and the ordinate is the power consumption (KW).

After the second multi-dimensional clustering, the contour average value T is 0.7, and the condition that T is not less than or equal to (1-m/10) is still not satisfied, so the third multi-dimensional clustering is performed on the result of the second multi-dimensional clustering, and the result of the third multi-dimensional clustering is shown in fig. 5(a) -5(d), wherein the abscissa of the curve is the time point in one day, and the ordinate is the power consumption load (KW).

After the third multidimensional clustering is finished, the contour average value T is 0.7, which satisfies the condition that T is greater than or equal to (1-m/10), and the final 5 data are clustered and formed, as shown in fig. 6, wherein the abscissa of the curve is a time point in one day, and the ordinate is the power consumption load (KW).

Step S10: classifying the residential users according to the data clusters, specifically, according to a plurality of preset electricity utilization characteristic indexes, acquiring residential user classifications corresponding to a plurality of data clusters,

the predetermined electricity usage characteristics include overall load level, peak electricity usage hours, late load down time points, and daily load fluctuation rate.

The expression of the load level value of the entire load level is shown as (13):

P_levelrepresenting a load level value; p_averageThe daily average load is; p_maxThe calculation result according to equation (13) is classified into five categories, i.e., a high load level (50% or higher), a high load level (less than 50% and 40% or higher), a medium load level (less than 40% and 20% or higher), a low load level (less than 20% and 10% or higher), and an extremely low load level (less than 10%), as the maximum value of all the daily electricity load data.

The peak electricity utilization period comprises four periods of 5 to 6 points, 11 to 12 points, 19 to 20 points and 20 to 21 points.

The late load drop time point includes two types of 21 points and 22 points.

The expression of the daily load fluctuation ratio is formula (14):

P_waverepresenting daily load fluctuation rate; p_errorStandard deviation of daily load. The calculation results according to equation (14) are classified into two categories, fluctuation (30% or more) and no fluctuation (30% or less).

The resident user classification includes office workers + children integrated residents, office workers + old people integrated residents, old people family residents, single-person office workers and integrated multi-person mouth residents.

Office workers and children comprehensive households: the load level value is a high load level, the electricity consumption peak time period is 11-12 points and 19-20 points, the late load reduction time point is 21 points, the daily load fluctuation rate is fluctuated, in this embodiment, the daily average electricity load of the residential users corresponding to the second daily electricity load curve from top to bottom in fig. 6 is 204kW, and the load level value is high (40.8%); there are 2 distinct peak hours of electricity consumption, about 11-12 and 19-20 respectively, and the load decline occurs earlier in the evening and about 21 with a large daily load fluctuation rate of 33.5%. Based on actual research and analysis, the class of users accords with the electricity utilization condition of resident users with children at home, and therefore the resident users corresponding to the daily electricity load curve are defined as office workers and children integrated residents in a classified mode.

Office workers and old people comprehensive households: the load level value is a medium load level, the electricity consumption peak time periods are 11-12 points and 19-20 points, the late load descending time point is 21 points, the daily load fluctuation rate is no fluctuation, in this embodiment, the electricity consumption situation of the residential users corresponding to the third daily electricity consumption load curve from top to bottom in fig. 6 is similar to that of the office group and the child integrated household, but the daily average electricity consumption is 146kW, the overall level value is medium (29.2%), which is lower than that of the office group and the child integrated household, and the electricity consumption is very gentle and has no obvious fluctuation (the daily load fluctuation rate is less than 30%). Based on actual research and analysis, the class of users accords with the electricity consumption condition of the mixed resident users of the old people and the office workers, and therefore the resident users corresponding to the daily electricity load curve are classified into office workers and old people integrated residents.

Family residents of the old: the load level value is a low load level, the peak time of electricity utilization is 5-6 points and 20-21 points, the late load decline time point is 21 points, the daily load fluctuation rate is no fluctuation, in this embodiment, the daily average electricity utilization load of the residential users corresponding to the fourth daily electricity utilization load curve from top to bottom in fig. 6 is 87kW, the load level value is low (17.4%), the users have 2 electricity utilization peaks respectively at about 5-6 points and about 20-21 points, and the late load decline time point is near 21 points. Based on actual research and analysis, the users conform to the daily work and rest of the old and have the characteristic of strong power-saving consciousness. Therefore, the resident users corresponding to the daily electricity load curve are classified and defined as the family residents of the old.

Single office family: the load level value is an extremely low load level, the electricity consumption peak time period is 20-21 points, the late load decline time point is 22 points, and the daily load fluctuation rate is non-fluctuation, in this embodiment, the daily average electricity load of the residential users corresponding to the fifth daily electricity load curve from top to bottom in fig. 6 is only 44kW, the load level value is extremely low (8.8%), the daytime electricity load of the users is relatively gentle, the electricity consumption peak is only one, and the late load decline time point is near 22 points near 20-21 points. Based on actual research and analysis, the class of users accords with the electricity utilization condition of single office workers, and therefore the resident users corresponding to the daily electricity load curve are classified and defined as the single office workers.

Comprehensive multi-person resident: the load level value is a higher load level, the electricity consumption peak time periods are 11-12 points and 20-21 points, the late load descending time point is 22 points, and the daily load fluctuation rate is non-fluctuation, in the embodiment, the daily electricity consumption load of the residential users corresponding to the first daily electricity consumption load curve from top to bottom in fig. 6 is up to 312kW, the load level value is higher (62.4%), 2 obvious electricity consumption peak time periods exist, which are respectively about 11-12 points and about 20-21 points, and the load descending time point is later at about 22 points, and the daily load fluctuation rate is smaller and is below 30%. Based on actual research and analysis, the class of users accords with the electricity utilization condition of the comprehensive class of residents with multiple populations, and therefore the resident users corresponding to the daily electricity load curve are classified into the comprehensive class of residents with multiple populations.

Effects and effects of the embodiments

According to the residential user classification method based on power load analysis in the embodiment, since data preprocessing is performed on daily power load data to obtain a plurality of sample data, then pre-clustering and clustering are performed on the sample data to obtain a plurality of data clusters, then the profile average value of the plurality of data clusters is compared with the preset profile threshold value for judgment, then the sample data is subjected to repeated pre-clustering and clustering according to the judgment result, the clustering frequency and the number of the sample data in the data clusters, and finally the residential users are classified according to the data clusters, the residential user classification method based on power load analysis in the embodiment forms a plurality of data clusters by performing multiple clustering on the sample data based on the Bayesian criterion, thereby rapidly performing the optimal division of the plurality of data clusters, and then classify the residential users according to the data clustering, compare the residential user classification method in the past, it is more meticulous to the analysis of residential power load, improve the accuracy to the classification of residential user greatly to let the power supply unit can confirm the power load characteristic of all kinds of residential users according to the residential user's kind, and then accurately carry out the supply side management of electric wire netting system, guarantee electric wire netting system safety and stability operation.

Because the data preprocessing in the embodiment includes data cleaning and data normalization processing of the daily electricity load data by adopting a Newton interpolation method, the sample data is obtained, so that the sample data is uniformly distributed and can be integrated among different dimensional data, thereby avoiding data deviation caused by the daily electricity load data from different sources and ensuring the accuracy of a multi-dimensional clustering result.

Due to the introduction of the profile average value in the embodiment, the process of multi-dimensional clustering division is supported by a mathematical theory, so that the process of multi-dimensional clustering division is more rigorous, and the accuracy of a multi-dimensional clustering result is further improved.

Because the resident user is classified specifically according to a plurality of predetermined power consumption characteristic indexes, the resident user who obtains corresponding and a plurality of data clustering corresponds is classified, and the predetermined power consumption characteristic index is classified again, so can make things convenient for power supply unit to take a position according to the power consumption condition number of user, reasonable overall planning to the management and the planning ability of power supply unit to the supply side of electric wire netting system have been improved greatly.

The above-described embodiments are preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and various modifications and changes can be made by those skilled in the art without inventive work within the scope of the appended claims.

Claims

1. A residential user classification method based on power load analysis is used for classifying residential users according to a plurality of daily power load data, and is characterized by comprising the following steps:

step S2: pre-clustering the sample data to obtain a plurality of data sub-clusters;

step S3: performing clustering on the data sub-clusters based on Bayesian criterion to obtain a plurality of data clusters;

step S4: analyzing and calculating the data clusters to obtain the profile average value of the data clusters;

step S5: judging whether the profile average value is greater than or equal to a preset profile threshold value, if so, entering a step S10, and if not, entering a step S6;

step S7: judging whether the number of the sample data in each data cluster is less than or equal to a preset sample number, if not, taking the data cluster as an intermediate data cluster, and entering step S8, and if so, taking the data cluster as a determined data cluster, and entering step S9;

2. The load analysis-based resident user classifying method according to claim 1, wherein:

wherein the number of the data clusters is m,

the predetermined number of clustering times is M times,

the contour threshold is

3. The load analysis-based resident user classifying method according to claim 1, wherein:

the data preprocessing in step S1 includes the following sub-steps:

step S1-1: carrying out data cleaning on the plurality of daily electric load data by adopting a Newton interpolation method to obtain a plurality of initial data;

4. The load analysis-based resident user classifying method according to claim 1, wherein:

wherein, step S2 includes the following substeps:

step S2-1: reading the sample data one by one based on a BIRCH algorithm;

5. The load analysis-based resident user classifying method according to claim 1, wherein:

wherein, the expression of the bayesian criterion in step S3 is:

BIC＝-2ln(L)+ln(h)·Y，

BIC is the classification evaluation of the data cluster, the higher BIC represents the more reasonable classification of the data cluster, L is a maximum likelihood function value, h is the number of the data sub-clusters, and Y is the number of the sample data contained in all the data sub-clusters.

6. The load analysis-based resident user classifying method according to claim 1, wherein:

wherein the number of the sample data is n, n is a positive integer greater than or equal to 2,

step S4 includes the following substeps:

step S4-1: obtaining n intra-cluster dissimilarity a (i) of the sample data according to the n sample data, wherein the expression of the intra-cluster dissimilarity a (i) is as follows:

i, i ' is two sample data in the same data cluster, dist (i, i ') is the Euclidean distance between the two sample data i, i ', and | C_sL is the number of sample data contained in the data cluster s to which the sample data i belongs;

step S4-2: obtaining n inter-cluster dissimilarity degrees b (i) of the sample data according to the n sample data, wherein the expression of the inter-cluster dissimilarity degrees b (i) is as follows:

i, i 'is two sample data in different data clusters, dist (i, i') is the Euclidean distance between the two sample data i, i | C_t| is the number of sample data contained in the data cluster t to which the sample data i' belongs;

step S4-3: obtaining the mean value T of the contour according to the intra-cluster dissimilarity a (i) of the sample data and the inter-cluster dissimilarity b (i) of the sample data, wherein the mean value T of the contour is expressed by:

s (i) is a contour coefficient expressed as:

7. the load analysis-based resident user classifying method according to claim 1, wherein:

in step S10, the step of classifying the residential users is to obtain the residential user classifications corresponding to the data clusters according to a plurality of predetermined electricity utilization characteristic indexes.

8. The load analysis-based resident user classifying method according to claim 7, wherein:

the resident user classification comprises office workers, child comprehensive residents, office workers, old comprehensive residents, old family residents, single office workers and comprehensive multi-person resident residents.

9. The load analysis-based resident user classifying method according to claim 7, wherein:

wherein the predetermined electricity usage characteristic indicators include an overall load level, an electricity usage peak time period, a late load decline time point, and a daily load fluctuation rate,

the expression of the load level value of the overall load level is:

the P is_levelRepresenting a load level value; the P is_averageThe daily average load is; the P is_maxIs the maximum value of all daily electricity load data,

the electricity consumption peak period comprises 5 to 6 points, 11 to 12 points, 19 to 20 points and 20 to 21 points,

the late load drop time points include 21 points and 22 points,

the expression of the daily load fluctuation rate is as follows:

the P is_waveRepresenting daily load fluctuation rate; the P is_errorStandard deviation of daily load.

10. The load analysis-based resident user classifying method according to claim 9, wherein:

wherein the overall load level includes a high load level, a higher load level, a medium load level, a low load level, and an extremely low load level,

said load level value P of said high load level_levelMore than or equal to 50 percent,

said higher load level being said load level value P_levelLess than 50% and not less than 40%,

the medium load level is the load level value P_levelLess than 40% and not less than 20%,

the low load level is the load level value P_levelLess than 20% and not less than 10%,

the very low load level is the load level value P_levelLess than 10%.