CN110825723A - Residential user classification method based on power load analysis - Google Patents

Residential user classification method based on power load analysis Download PDF

Info

Publication number
CN110825723A
CN110825723A CN201910952518.8A CN201910952518A CN110825723A CN 110825723 A CN110825723 A CN 110825723A CN 201910952518 A CN201910952518 A CN 201910952518A CN 110825723 A CN110825723 A CN 110825723A
Authority
CN
China
Prior art keywords
data
load
sample data
clustering
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910952518.8A
Other languages
Chinese (zh)
Other versions
CN110825723B (en
Inventor
夏飞
张洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai University of Electric Power
Original Assignee
Shanghai University of Electric Power
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai University of Electric Power filed Critical Shanghai University of Electric Power
Priority to CN201910952518.8A priority Critical patent/CN110825723B/en
Publication of CN110825723A publication Critical patent/CN110825723A/en
Application granted granted Critical
Publication of CN110825723B publication Critical patent/CN110825723B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a resident user classification method based on power load analysis, which comprises the steps of firstly carrying out data preprocessing on daily power load data to obtain a plurality of sample data, then carrying out pre-clustering and agglomerative clustering on the sample data to obtain a plurality of data clusters, then comparing and judging the profile average value of the plurality of data clusters with a preset profile threshold value, then carrying out repeated pre-clustering and agglomerative clustering on the sample data according to the judgment result, the times of the agglomerative clustering and the number of the sample data in the data clusters, and finally classifying the resident users according to the data clusters.

Description

Residential user classification method based on power load analysis
Technical Field
The invention belongs to the field of power supply, and particularly relates to a residential user classification method based on power load analysis.
Background
The electrical load of the residential users is gradually becoming a major component of the peak load in the power grid system, and new challenges are brought to the safe and stable operation of the power grid system. Therefore, how to realize the supply side management of the power grid system aiming at the electricity load characteristics of various residential users is the key of safe and stable operation of the power grid system in the future.
Aiming at the cluster analysis of the electricity load characteristics of various residential users, a plurality of scholars have made relevant research. Chua Heng, Wuhui Cheng, Zhou and the like in ' investigation and load analysis of electricity utilization in certain residential quarter ' (Jiangxi electric power, 2017,41(2):24-27 '), give power utilization curves of four seasons of a user, and power utilization curves of holidays, workdays and weekends of the user by analyzing data of a user intelligent electric meter in certain residential quarter in Nanchang city, analyze power utilization behaviors of the user, and provide a basis for innovative services of power utilization customers, power supply enterprises and social environments. Liufei, cardia jun, etc. in "typical load characteristic analysis of residents based on cluster analysis" (Jiangsu motor engineering, 2007,12(26):34-37), the electricity data is analyzed by K-means clustering, typical electricity load representative curves in different seasons are obtained, and some relations between the load characteristics of residents and various influence factors are obtained through research. Dingqi, Wang Guang and so on cluster users in a typical substation area in the clustering analysis application of regional power user load patterns (electromechanical engineering, 2008,25(9):31-33,84), and carry out analogy with the traditional national economy industry classification, and the method also provides reference basis for power supply departments in the aspects of power load management, substation planning, state estimation and so on. The fuzzy C-means clustering algorithm is utilized to perform clustering analysis on the resident load curve in 'resident user electricity decision model and information system research of supply and demand interaction' (academic paper: North China electric university.2017), so that different electricity utilization characteristics of residents are obtained, the resident electricity utilization optimization space is explored, the users are guided to reasonably use electricity, the electricity utilization structure is optimized, and the effects of peak clipping and valley filling are achieved. Grandson and Yiwei, Li Bin and so on propose a user hierarchical clustering method based on differentiation feature extraction in a user hierarchical clustering and package recommending method facing to reform of electricity selling side (power grid technology, 2018,42(2): 447-; and in the layer 2, differentiated power utilization characteristics are extracted for various users obtained in the layer 1, and the users are classified again by respectively applying a proper clustering algorithm. And finally, recommending a proper electricity price package for the sub-class users after the two-layer clustering.
However, the power consumption load of the residential users has the characteristic of large data volume of power consumption information, and the power consumption modes of the residential users of different types also have great difference, but the method has insufficient refinement degree of the analysis of the power consumption load characteristics of various residential users, so that the residential users cannot be accurately classified according to the power consumption modes of various residential users, and therefore, a power supply unit cannot determine the power consumption load characteristics of various residential users according to the types of the residential users, so that the supply side management of a power grid system is accurately performed, and the safe and stable operation of the power grid system is ensured.
Disclosure of Invention
The effective analysis of the electricity load characteristics of various residential users is the basis for implementing the management measures of the supply side of the power grid system. Through the analysis of the power load characteristics of various residential users, the evaluation of the power load composition and the power consumption mode in one region is facilitated, the power consumption management system is also an important research work for reasonably arranging the power consumption layout and effectively utilizing the power energy resources, and the safe and stable operation of a power grid system can be guaranteed.
The invention aims to provide a resident user classification method obtained according to a daily electric load curve of a resident user, so that the electric load characteristics of various resident users can be determined according to the types of the resident users, the supply side management of a power grid system is further accurately carried out, and the safe and stable operation of the power grid system is ensured.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a resident user classification method based on power load analysis, which is characterized by comprising the following steps:
step S1: carrying out data preprocessing on a plurality of daily electric load data to obtain a plurality of sample data;
step S2: pre-clustering sample data to obtain a plurality of data sub-clusters;
step S3: performing clustering on the data sub-clusters based on a Bayesian criterion to obtain a plurality of data clusters;
step S4: analyzing and calculating the data clusters to obtain the contour average value of the agglomeration clusters;
step S5: judging whether the profile average value is greater than or equal to a preset profile threshold value, if so, going to step S10, and if not, going to step S6;
step S6: judging whether the times of the clustering is less than or equal to the preset clustering times, and if so, entering the step S7;
step S7: judging whether the number of sample data in each data cluster is less than or equal to the preset sample number, if not, taking the data cluster as an intermediate data cluster, entering a step S8, and if so, taking the data cluster as a determined data cluster, and entering a step S9;
step S8: repeating the steps S2-S3 according to the sample data in the intermediate data cluster to obtain a data cluster to be determined;
step S9: integrating the data clusters to be determined and the determined data clusters to obtain new data clusters, and then entering step S4;
step S10: and classifying the residential users according to the data clustering.
The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein the number of data clusters is M, the preset clustering frequency is M, and the contour threshold value is
Figure BDA0002226222390000041
The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: the data preprocessing in step S1 includes the following sub-steps:
step S1-1: carrying out data cleaning on a plurality of daily electric load data by adopting a Newton interpolation method to obtain a plurality of initial data;
step S1-2: and respectively carrying out data normalization processing on the plurality of initial data to obtain a plurality of corresponding sample data.
The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein, step S2 includes the following substeps:
step S2-1: reading sample data one by one based on a BIRCH algorithm;
step S2-2: and pre-clustering a plurality of sample data in the dense area according to the reading result so as to obtain the data sub-cluster.
The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein, the expression of the bayesian criterion in step S3 is:
BIC=-2ln(L)+ln(h)·Y,
BIC is the classification evaluation of the data clusters, the higher the BIC is, the more reasonable the classification of the data clusters is represented, L is a maximum likelihood function value, h is the number of data sub-clusters, and Y is the number of sample data contained in all the data sub-clusters.
The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein the number of sample data is n, n is a positive integer greater than or equal to 2,
step S4 includes the following substeps:
step S4-1: obtaining intra-cluster dissimilarity a (i) of n sample data according to the correspondence of the n sample data, wherein the expression of the intra-cluster dissimilarity a (i) is as follows:
Figure BDA0002226222390000051
i, i 'is two sample data in the same data cluster, dist (i, i') is the Euclidean distance between the two sample data i, i | CsL is the number of sample data contained in the data cluster s to which the sample data i belongs;
step S4-2: obtaining cluster dissimilarity b (i) of n sample data according to n sample data, wherein the expression of the cluster dissimilarity b (i) is as follows:
Figure BDA0002226222390000052
i, i ' are two sample data in different data clusters, dist (i, i ') is the Euclidean distance between the two sample data i, i ',|Ctl is the number of sample data contained in the data cluster t to which the sample data i' belongs;
step S4-3: obtaining a contour average value T according to the intra-cluster dissimilarity a (i) of the sample data and the inter-cluster dissimilarity b (i) of the sample data, wherein the expression of the contour average value T is as follows:
Figure BDA0002226222390000061
s (i) is a contour coefficient expressed as:
Figure BDA0002226222390000062
the method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: in step S10, the step of classifying the residential users is to obtain the residential user classifications corresponding to the data clusters according to the predetermined electricity utilization characteristic indexes.
The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: the resident user classification comprises office workers, children comprehensive households, office workers, old people comprehensive households, old people family households, single-person office workers and comprehensive multi-person family households.
The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: the preset electricity utilization characteristic indexes comprise an overall load level, an electricity utilization peak time period, a late load descending time point and a daily load fluctuation rate, and the expression of a load level value of the overall load level is as follows:
Figure BDA0002226222390000063
Plevelrepresenting a load level value; paverageThe daily average load is; pmaxThe peak time of the electricity consumption comprises 5 to 6 points, 11 to 12 points and 19 to 20 points for the maximum value of all the daily electricity consumption load dataAnd 20 to 21 points, wherein the late load reduction time point comprises 21 and 22 points, and the daily load fluctuation rate is expressed as:
Pwaverepresenting daily load fluctuation rate; perrorStandard deviation of daily load.
The method for classifying the resident users based on the electric load analysis provided by the invention can also have the following characteristics: wherein the overall load level includes a high load level, a higher load level, a medium load level, a low load level, an extremely low load level, and a load level value P of the high load levellevelGreater than or equal to 50%, the higher load level being the load level value PlevelLess than 50% and not less than 40%, and the medium load level is the load level value PlevelLess than 40% and not less than 20%, the low load level being the load level value PlevelLess than 20% and not less than 10%, and the extremely low load level is the load level value PlevelLess than 10%.
Action and Effect of the invention
According to the resident user classification method based on the power load analysis, the method comprises the steps of firstly preprocessing daily power load data to obtain a plurality of sample data, then carrying out pre-clustering and clustering on the sample data to obtain a plurality of data clusters, then comparing and judging the profile average value of the plurality of data clusters with the preset profile threshold value, then carrying out repeated pre-clustering and clustering on the sample data according to the judgment result, the clustering frequency and the number of the sample data in the data clusters, and finally classifying the resident users according to the data clusters, so that the resident user classification method based on the power load analysis forms the plurality of data clusters by carrying out multi-clustering on the sample data based on the Bayesian criterion, thereby rapidly carrying out the optimal division of the plurality of data clusters, and then classify the residential users according to the data clustering, compare the residential user classification method in the past, it is more meticulous to the analysis of residential power load, improve the accuracy to the classification of residential user greatly to let the power supply unit can confirm the power load characteristic of all kinds of residential users according to the residential user's kind, and then accurately carry out the supply side management of electric wire netting system, guarantee electric wire netting system safety and stability operation.
Drawings
Fig. 1 is a schematic step diagram of a residential user classification method based on electric load analysis in an embodiment of the present invention;
FIG. 2 is a sample data curve in an embodiment of the present invention;
fig. 3(a) is a data clustering result one formed after the first pre-clustering of the daily electricity load data in the embodiment of the present invention;
fig. 3(b) is a second data clustering result formed after the first pre-clustering of the daily electricity load data in the embodiment of the present invention;
fig. 4(a) is a first data clustering result formed after the second pre-clustering of the daily electricity load data in the embodiment of the present invention;
fig. 4(b) is a second data clustering result formed after the second pre-clustering of the daily electricity load data in the embodiment of the present invention;
fig. 5(a) is a data clustering result one formed after third pre-clustering of daily electricity load data in the embodiment of the present invention;
fig. 5(b) is a second data clustering result formed after third pre-clustering of daily electricity load data in the embodiment of the present invention;
fig. 5(c) is a data clustering result three formed after third pre-clustering of daily electricity load data in the embodiment of the present invention;
fig. 5(d) is a data clustering result four formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; and
fig. 6 is a data clustering result of daily electricity load data after pre-clustering.
Detailed Description
In order to make the technical means, creation features, achievement purposes and effects of the present invention easy to understand, the following embodiments specifically describe the residential user classification method based on the electric load analysis in conjunction with the accompanying drawings.
Fig. 1 is a schematic step diagram of a residential user classification method based on electric load analysis in an embodiment of the present invention.
As shown in fig. 1, the residential user classification method based on electric load analysis in the present embodiment is used for classifying residential users according to a plurality of daily electric load data, and comprises the following steps:
step S1: performing data preprocessing on the plurality of daily electric load data to acquire a plurality of sample data, wherein the data preprocessing in the step S1 comprises the following substeps:
step S1-1: and (4) carrying out data cleaning on the plurality of daily electric load data by adopting a Newton interpolation method to obtain a plurality of initial data.
In the actual sampling process, due to hardware factors, a part of daily electric load data is lost in the sampling process, so that data cleaning needs to be performed on the lost daily electric load data by using a newton interpolation method so that the number of initial data is consistent with the number of daily electric load data.
The specific process is as follows: and (3) carrying out data cleaning on a plurality of daily electric load data, and filling missing data by mainly adopting a Newton interpolation method to obtain a plurality of initial data.
The interpolation polynomial of the Newton interpolation method is shown in formula (1):
Figure BDA0002226222390000101
wherein n is the number of daily electricity load data, n is a positive integer greater than or equal to 2, and f (x)i) Missing daily electrical load data for Newton interpolation, (x)1,f(x1)),(x2,f(x2)),…,(xn,f(xn) Is a sequence of n daily electricity load data, (x)i,f(xi) X belongs to R, i belongs to [1, n ] as missing daily electricity load data],P(xi) Approximating the function for Newton's interpolation, R (x)i) Is an error function.
The expression of the newton's interpolation approximation function is shown in equation (2):
Figure BDA0002226222390000102
the expression of the error function is shown in equation (3):
R(xi)=(xi-x1)(xi-x2)…(xi-xn)f[xn,xn-1,…,x1,xi](3)
in this embodiment, the daily electrical load data is daily electrical load data of a plurality of districts for one year, each district corresponds to a power supply area of one transformer in the power grid system, the sampling number of the daily electrical load data is 96 (24 hours, one point is acquired every 15 minutes), and the number of the initial data is also 96.
Step S1-2: respectively carrying out data normalization processing on the plurality of initial data to obtain a plurality of corresponding sample data, specifically, converting the value of the initial data to [0,1] by adopting a linear normalization method to obtain the corresponding sample data.
The normalized formula is shown in formula (4):
p'i=(pi-min(p))/(max(p)-min(p)) (4)
wherein i is ∈ [1, n ].
Wherein p is initial data, piIs normalized data, i.e. sample data.
Fig. 2 is a sample data curve in an embodiment of the present invention.
In this embodiment, the plurality of initial data are originated from a plurality of cells, and the capacities of the initial data representations originated from different cells are different, and although the dimensions of the initial data are the same, the sizes of the initial data originated from different cells are different greatly. Therefore, all initial data needs to be normalized, namely, dimensional initial data is converted into dimensionless initial data through transformation, namely, scalar sample data is obtained, and therefore the accuracy of the subsequent clustering result is guaranteed.
As shown in fig. 2, normalization processing is performed on a plurality of initial data by using equation (4), a plurality of sample data are obtained, and a sample data curve is drawn according to the plurality of sample data. In practical applications, the maximum load (L) obtained by different cellsmax) Minimum load (L)min) Is different, which may cause the normalized result to be unstable, thereby affecting the subsequent result. Therefore, L is replaced by an empirical value according to the actual situation of the load of each cellmax、LminHere, take Lmax=500,L min0, this avoids L for different sample setsmax、LminDifferent resulting model bias.
Step S2: pre-clustering sample data to obtain a plurality of data sub-clusters, wherein step S2 includes the following sub-steps:
step S2-1: reading sample data one by one based on a BIRCH algorithm, specifically reading concentrated data points of a plurality of sample data one by adopting a thought of CF (Clustering feature) tree growth in a BIRCH (balanced iterative reduction and Clustering by using a hierarchical structure) algorithm.
Step S2-2: and pre-clustering a plurality of sample data in the dense area according to the reading result so as to obtain a data sub-cluster, specifically, pre-clustering the sample data in the dense area while generating the CF tree so as to form a plurality of data sub-clusters.
Step S3: performing clustering on the data sub-clusters based on a Bayesian criterion to obtain a plurality of data clusters, wherein the number of the data clusters is m, and the expression of the Bayesian criterion is shown in formula (5):
BIC=-2ln(L)+ln(h)·Y (5)
BIC is data cluster evaluation, the higher BIC represents the more reasonable division of data clusters, L is a maximum likelihood function value, h is the number of data sub-clusters, Y is the number of sample data contained in all the data sub-clusters,
the method specifically comprises the following steps:
and taking the data sub-clusters which are the results of the pre-clustering stage as objects, combining the data sub-clusters one by utilizing an aggregation method (namely repeatedly combining the two latest data sub-clusters to form a new data sub-cluster) until the data sub-clusters are combined to the expected number of the data sub-clusters, and taking the data sub-clusters at the moment as data clusters.
The pre-clustering-agglomerative clustering is a two-step clustering, and when the two-step clustering is adopted, the clustering Criterion is based on a Bayesian Criterion, namely Bayesian Information Criterion (BIC).
Step S4: analyzing and calculating the data clusters to obtain the profile average values of the data clusters, wherein the step S4 comprises the following substeps:
step S4-1: obtaining intra-cluster dissimilarity a (i) of n sample data according to the n sample data, wherein the expression of the intra-cluster dissimilarity a (i) is formula (9):
Figure BDA0002226222390000131
i, i 'is two sample data in the same data cluster, dist (i, i') is the Euclidean distance between the two sample data i, i | CsL is the number of sample data contained in the data cluster s to which the sample data i belongs;
step S4-2: obtaining inter-cluster dissimilarity b (i) of n sample data according to the correspondence of the n sample data, wherein an expression of the inter-cluster dissimilarity b (i) is shown as formula (10):
Figure BDA0002226222390000132
i, i 'are two sample data in different data clusters, dist (i, i') is the Euclidean distance between the two sample data i, i | CtL is the number of sample data contained in the data cluster t to which the sample data i' belongs;
step S4-3: obtaining a contour average value T according to the intra-cluster dissimilarity a (i) of the sample data and the inter-cluster dissimilarity b (i) of the sample data, wherein an expression of the contour average value T is shown in formula (11):
Figure BDA0002226222390000133
s (i) is a contour coefficient, and the expression is shown in formula (12):
Figure BDA0002226222390000141
step S5: judging whether the profile average value is greater than or equal to a preset profile threshold value
Figure BDA0002226222390000142
The predetermined clustering frequency is M, namely, the profile average value is judged
Figure BDA0002226222390000143
If the determination result is yes, the clustering is completed, and the process proceeds to step S10, and if the determination result is no, the process proceeds to the next step, i.e., step S6.
In the present embodiment, the predetermined clustering number M is equal to 3.
Step S6: and judging whether the times of the clustering is less than or equal to the preset clustering times, if so, entering the next step, namely step S7, and ending clustering to obtain a clustering result.
Step S7: judging whether the number of sample data in each data cluster is less than or equal to the preset sample number, if not, taking the data cluster as an intermediate data cluster, entering a step S8, and if so, taking the data cluster as a determined data cluster, and entering a step S9;
the process of step S7 specifically includes:
setting the clustering result as the existence of Q types of data clusters, and respectively clustering the number w of sample data in each type of data Qq(q=[1,Q]Q ∈ N), if wqLess than or equal to the predetermined number of samples, there is no need to continue clustering the class and do the data cluster qClustering q for determined data1Step S9 is entered, otherwise, the data cluster q is used as the intermediate data cluster q2The process proceeds to step S8.
In the present embodiment, the predetermined number of samples is 2.
Step S8: repeating the steps S2-S3 according to the sample data in the intermediate data cluster to obtain a data cluster to be determined;
the process of step S8 specifically includes:
clustering all intermediate data q2Repeating the steps S2-S3 once to obtain the data classification q to be determined3
Step S9: integrating the data clusters to be determined and the determined data clusters to obtain new data clusters, and then entering step S4;
the process of step S9 specifically includes:
classifying all data to be determined into a class q3And determined data cluster q1And integrating to obtain new data clusters.
Fig. 3(a) is a data clustering result one formed after the first pre-clustering of the daily electricity load data in the embodiment of the present invention; fig. 3(b) is a second data clustering result formed after the first pre-clustering of the daily electricity load data in the embodiment of the present invention; fig. 4(a) is a first data clustering result formed after the second pre-clustering of the daily electricity load data in the embodiment of the present invention; fig. 4(b) is a second data clustering result formed after the second pre-clustering of the daily electricity load data in the embodiment of the present invention; fig. 5(a) is a data clustering result one formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; fig. 5(b) is a second data clustering result formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; fig. 5(c) is a data clustering result three formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; fig. 5(d) is a data clustering result four formed after third pre-clustering of daily electricity load data in the embodiment of the present invention; fig. 6 is a data clustering result of daily electricity load data after pre-clustering.
As shown in fig. 3(a) -6, the specific process of cluster analysis is as follows:
performing first multidimensional clustering on the sample data, namely performing first pre-clustering, and separating 2 data clusters, wherein the first multidimensional clustering result is shown in fig. 3(a) -3(b), wherein the abscissa of the curve is a time point in one day, and the ordinate is the power consumption load (KW).
After the first multidimensional clustering, the contour average value T is 0.8, and the condition that T is greater than or equal to (1-m/10) is not satisfied, so that the second multidimensional clustering is performed on the first multidimensional clustering result, that is, the second pre-clustering is performed, because only the number of samples in the data clustering 2 is greater than 2, only the second multidimensional clustering is performed on the data clustering 2, and 2 data clusters are separated, and the second multidimensional clustering result is shown in fig. 4(a) -4(b), wherein the abscissa of the curve is a time point in one day, and the ordinate is the power consumption (KW).
After the second multi-dimensional clustering, the contour average value T is 0.7, and the condition that T is not less than or equal to (1-m/10) is still not satisfied, so the third multi-dimensional clustering is performed on the result of the second multi-dimensional clustering, and the result of the third multi-dimensional clustering is shown in fig. 5(a) -5(d), wherein the abscissa of the curve is the time point in one day, and the ordinate is the power consumption load (KW).
After the third multidimensional clustering is finished, the contour average value T is 0.7, which satisfies the condition that T is greater than or equal to (1-m/10), and the final 5 data are clustered and formed, as shown in fig. 6, wherein the abscissa of the curve is a time point in one day, and the ordinate is the power consumption load (KW).
Step S10: classifying the residential users according to the data clusters, specifically, according to a plurality of preset electricity utilization characteristic indexes, acquiring residential user classifications corresponding to a plurality of data clusters,
the predetermined electricity usage characteristics include overall load level, peak electricity usage hours, late load down time points, and daily load fluctuation rate.
The expression of the load level value of the entire load level is shown as (13):
Figure BDA0002226222390000171
Plevelrepresenting a load level value; paverageThe daily average load is; pmaxThe calculation result according to equation (13) is classified into five categories, i.e., a high load level (50% or higher), a high load level (less than 50% and 40% or higher), a medium load level (less than 40% and 20% or higher), a low load level (less than 20% and 10% or higher), and an extremely low load level (less than 10%), as the maximum value of all the daily electricity load data.
The peak electricity utilization period comprises four periods of 5 to 6 points, 11 to 12 points, 19 to 20 points and 20 to 21 points.
The late load drop time point includes two types of 21 points and 22 points.
The expression of the daily load fluctuation ratio is formula (14):
Pwaverepresenting daily load fluctuation rate; perrorStandard deviation of daily load. The calculation results according to equation (14) are classified into two categories, fluctuation (30% or more) and no fluctuation (30% or less).
The resident user classification includes office workers + children integrated residents, office workers + old people integrated residents, old people family residents, single-person office workers and integrated multi-person mouth residents.
Office workers and children comprehensive households: the load level value is a high load level, the electricity consumption peak time period is 11-12 points and 19-20 points, the late load reduction time point is 21 points, the daily load fluctuation rate is fluctuated, in this embodiment, the daily average electricity load of the residential users corresponding to the second daily electricity load curve from top to bottom in fig. 6 is 204kW, and the load level value is high (40.8%); there are 2 distinct peak hours of electricity consumption, about 11-12 and 19-20 respectively, and the load decline occurs earlier in the evening and about 21 with a large daily load fluctuation rate of 33.5%. Based on actual research and analysis, the class of users accords with the electricity utilization condition of resident users with children at home, and therefore the resident users corresponding to the daily electricity load curve are defined as office workers and children integrated residents in a classified mode.
Office workers and old people comprehensive households: the load level value is a medium load level, the electricity consumption peak time periods are 11-12 points and 19-20 points, the late load descending time point is 21 points, the daily load fluctuation rate is no fluctuation, in this embodiment, the electricity consumption situation of the residential users corresponding to the third daily electricity consumption load curve from top to bottom in fig. 6 is similar to that of the office group and the child integrated household, but the daily average electricity consumption is 146kW, the overall level value is medium (29.2%), which is lower than that of the office group and the child integrated household, and the electricity consumption is very gentle and has no obvious fluctuation (the daily load fluctuation rate is less than 30%). Based on actual research and analysis, the class of users accords with the electricity consumption condition of the mixed resident users of the old people and the office workers, and therefore the resident users corresponding to the daily electricity load curve are classified into office workers and old people integrated residents.
Family residents of the old: the load level value is a low load level, the peak time of electricity utilization is 5-6 points and 20-21 points, the late load decline time point is 21 points, the daily load fluctuation rate is no fluctuation, in this embodiment, the daily average electricity utilization load of the residential users corresponding to the fourth daily electricity utilization load curve from top to bottom in fig. 6 is 87kW, the load level value is low (17.4%), the users have 2 electricity utilization peaks respectively at about 5-6 points and about 20-21 points, and the late load decline time point is near 21 points. Based on actual research and analysis, the users conform to the daily work and rest of the old and have the characteristic of strong power-saving consciousness. Therefore, the resident users corresponding to the daily electricity load curve are classified and defined as the family residents of the old.
Single office family: the load level value is an extremely low load level, the electricity consumption peak time period is 20-21 points, the late load decline time point is 22 points, and the daily load fluctuation rate is non-fluctuation, in this embodiment, the daily average electricity load of the residential users corresponding to the fifth daily electricity load curve from top to bottom in fig. 6 is only 44kW, the load level value is extremely low (8.8%), the daytime electricity load of the users is relatively gentle, the electricity consumption peak is only one, and the late load decline time point is near 22 points near 20-21 points. Based on actual research and analysis, the class of users accords with the electricity utilization condition of single office workers, and therefore the resident users corresponding to the daily electricity load curve are classified and defined as the single office workers.
Comprehensive multi-person resident: the load level value is a higher load level, the electricity consumption peak time periods are 11-12 points and 20-21 points, the late load descending time point is 22 points, and the daily load fluctuation rate is non-fluctuation, in the embodiment, the daily electricity consumption load of the residential users corresponding to the first daily electricity consumption load curve from top to bottom in fig. 6 is up to 312kW, the load level value is higher (62.4%), 2 obvious electricity consumption peak time periods exist, which are respectively about 11-12 points and about 20-21 points, and the load descending time point is later at about 22 points, and the daily load fluctuation rate is smaller and is below 30%. Based on actual research and analysis, the class of users accords with the electricity utilization condition of the comprehensive class of residents with multiple populations, and therefore the resident users corresponding to the daily electricity load curve are classified into the comprehensive class of residents with multiple populations.
Effects and effects of the embodiments
According to the residential user classification method based on power load analysis in the embodiment, since data preprocessing is performed on daily power load data to obtain a plurality of sample data, then pre-clustering and clustering are performed on the sample data to obtain a plurality of data clusters, then the profile average value of the plurality of data clusters is compared with the preset profile threshold value for judgment, then the sample data is subjected to repeated pre-clustering and clustering according to the judgment result, the clustering frequency and the number of the sample data in the data clusters, and finally the residential users are classified according to the data clusters, the residential user classification method based on power load analysis in the embodiment forms a plurality of data clusters by performing multiple clustering on the sample data based on the Bayesian criterion, thereby rapidly performing the optimal division of the plurality of data clusters, and then classify the residential users according to the data clustering, compare the residential user classification method in the past, it is more meticulous to the analysis of residential power load, improve the accuracy to the classification of residential user greatly to let the power supply unit can confirm the power load characteristic of all kinds of residential users according to the residential user's kind, and then accurately carry out the supply side management of electric wire netting system, guarantee electric wire netting system safety and stability operation.
Because the data preprocessing in the embodiment includes data cleaning and data normalization processing of the daily electricity load data by adopting a Newton interpolation method, the sample data is obtained, so that the sample data is uniformly distributed and can be integrated among different dimensional data, thereby avoiding data deviation caused by the daily electricity load data from different sources and ensuring the accuracy of a multi-dimensional clustering result.
Due to the introduction of the profile average value in the embodiment, the process of multi-dimensional clustering division is supported by a mathematical theory, so that the process of multi-dimensional clustering division is more rigorous, and the accuracy of a multi-dimensional clustering result is further improved.
Because the resident user is classified specifically according to a plurality of predetermined power consumption characteristic indexes, the resident user who obtains corresponding and a plurality of data clustering corresponds is classified, and the predetermined power consumption characteristic index is classified again, so can make things convenient for power supply unit to take a position according to the power consumption condition number of user, reasonable overall planning to the management and the planning ability of power supply unit to the supply side of electric wire netting system have been improved greatly.
The above-described embodiments are preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and various modifications and changes can be made by those skilled in the art without inventive work within the scope of the appended claims.

Claims (10)

1. A residential user classification method based on power load analysis is used for classifying residential users according to a plurality of daily power load data, and is characterized by comprising the following steps:
step S1: carrying out data preprocessing on a plurality of daily electric load data to obtain a plurality of sample data;
step S2: pre-clustering the sample data to obtain a plurality of data sub-clusters;
step S3: performing clustering on the data sub-clusters based on Bayesian criterion to obtain a plurality of data clusters;
step S4: analyzing and calculating the data clusters to obtain the profile average value of the data clusters;
step S5: judging whether the profile average value is greater than or equal to a preset profile threshold value, if so, entering a step S10, and if not, entering a step S6;
step S6: judging whether the times of the clustering is less than or equal to the preset clustering times, and if so, entering the step S7;
step S7: judging whether the number of the sample data in each data cluster is less than or equal to a preset sample number, if not, taking the data cluster as an intermediate data cluster, and entering step S8, and if so, taking the data cluster as a determined data cluster, and entering step S9;
step S8: repeating the steps S2-S3 according to the sample data in the intermediate data cluster to obtain a data cluster to be determined;
step S9: integrating the data clusters to be determined and the determined data clusters to obtain new data clusters, and then entering step S4;
step S10: and classifying the residential users according to the data clustering.
2. The load analysis-based resident user classifying method according to claim 1, wherein:
wherein the number of the data clusters is m,
the predetermined number of clustering times is M times,
the contour threshold is
Figure FDA0002226222380000021
3. The load analysis-based resident user classifying method according to claim 1, wherein:
the data preprocessing in step S1 includes the following sub-steps:
step S1-1: carrying out data cleaning on the plurality of daily electric load data by adopting a Newton interpolation method to obtain a plurality of initial data;
step S1-2: and respectively carrying out data normalization processing on the plurality of initial data to obtain a plurality of corresponding sample data.
4. The load analysis-based resident user classifying method according to claim 1, wherein:
wherein, step S2 includes the following substeps:
step S2-1: reading the sample data one by one based on a BIRCH algorithm;
step S2-2: and pre-clustering a plurality of sample data in the dense area according to the reading result so as to obtain the data sub-cluster.
5. The load analysis-based resident user classifying method according to claim 1, wherein:
wherein, the expression of the bayesian criterion in step S3 is:
BIC=-2ln(L)+ln(h)·Y,
BIC is the classification evaluation of the data cluster, the higher BIC represents the more reasonable classification of the data cluster, L is a maximum likelihood function value, h is the number of the data sub-clusters, and Y is the number of the sample data contained in all the data sub-clusters.
6. The load analysis-based resident user classifying method according to claim 1, wherein:
wherein the number of the sample data is n, n is a positive integer greater than or equal to 2,
step S4 includes the following substeps:
step S4-1: obtaining n intra-cluster dissimilarity a (i) of the sample data according to the n sample data, wherein the expression of the intra-cluster dissimilarity a (i) is as follows:
i, i ' is two sample data in the same data cluster, dist (i, i ') is the Euclidean distance between the two sample data i, i ', and | CsL is the number of sample data contained in the data cluster s to which the sample data i belongs;
step S4-2: obtaining n inter-cluster dissimilarity degrees b (i) of the sample data according to the n sample data, wherein the expression of the inter-cluster dissimilarity degrees b (i) is as follows:
Figure FDA0002226222380000032
i, i 'is two sample data in different data clusters, dist (i, i') is the Euclidean distance between the two sample data i, i | Ct| is the number of sample data contained in the data cluster t to which the sample data i' belongs;
step S4-3: obtaining the mean value T of the contour according to the intra-cluster dissimilarity a (i) of the sample data and the inter-cluster dissimilarity b (i) of the sample data, wherein the mean value T of the contour is expressed by:
Figure FDA0002226222380000041
s (i) is a contour coefficient expressed as:
Figure FDA0002226222380000042
7. the load analysis-based resident user classifying method according to claim 1, wherein:
in step S10, the step of classifying the residential users is to obtain the residential user classifications corresponding to the data clusters according to a plurality of predetermined electricity utilization characteristic indexes.
8. The load analysis-based resident user classifying method according to claim 7, wherein:
the resident user classification comprises office workers, child comprehensive residents, office workers, old comprehensive residents, old family residents, single office workers and comprehensive multi-person resident residents.
9. The load analysis-based resident user classifying method according to claim 7, wherein:
wherein the predetermined electricity usage characteristic indicators include an overall load level, an electricity usage peak time period, a late load decline time point, and a daily load fluctuation rate,
the expression of the load level value of the overall load level is:
Figure FDA0002226222380000051
the P islevelRepresenting a load level value; the P isaverageThe daily average load is; the P ismaxIs the maximum value of all daily electricity load data,
the electricity consumption peak period comprises 5 to 6 points, 11 to 12 points, 19 to 20 points and 20 to 21 points,
the late load drop time points include 21 points and 22 points,
the expression of the daily load fluctuation rate is as follows:
Figure FDA0002226222380000052
the P iswaveRepresenting daily load fluctuation rate; the P iserrorStandard deviation of daily load.
10. The load analysis-based resident user classifying method according to claim 9, wherein:
wherein the overall load level includes a high load level, a higher load level, a medium load level, a low load level, and an extremely low load level,
said load level value P of said high load levellevelMore than or equal to 50 percent,
said higher load level being said load level value PlevelLess than 50% and not less than 40%,
the medium load level is the load level value PlevelLess than 40% and not less than 20%,
the low load level is the load level value PlevelLess than 20% and not less than 10%,
the very low load level is the load level value PlevelLess than 10%.
CN201910952518.8A 2019-10-09 2019-10-09 Resident user classification method based on electricity load analysis Active CN110825723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910952518.8A CN110825723B (en) 2019-10-09 2019-10-09 Resident user classification method based on electricity load analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910952518.8A CN110825723B (en) 2019-10-09 2019-10-09 Resident user classification method based on electricity load analysis

Publications (2)

Publication Number Publication Date
CN110825723A true CN110825723A (en) 2020-02-21
CN110825723B CN110825723B (en) 2023-04-25

Family

ID=69548729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910952518.8A Active CN110825723B (en) 2019-10-09 2019-10-09 Resident user classification method based on electricity load analysis

Country Status (1)

Country Link
CN (1) CN110825723B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506635A (en) * 2020-05-11 2020-08-07 上海积成能源科技有限公司 System and method for analyzing residential electricity consumption behavior based on autoregressive naive Bayes algorithm
CN111783827A (en) * 2020-05-27 2020-10-16 中能瑞通(北京)科技有限公司 Enterprise user classification method and device based on load data
CN111861781A (en) * 2020-02-29 2020-10-30 上海电力大学 Feature optimization method and system in residential electricity consumption behavior clustering
CN113743977A (en) * 2021-06-28 2021-12-03 国网上海市电力公司 User behavior-based electricity consumption data feature extraction method and system
CN113872204A (en) * 2021-12-03 2021-12-31 全球能源互联网研究院有限公司 Power load property determination method and device based on power grid diagram topological calculation
CN114202011A (en) * 2021-10-30 2022-03-18 湖南江军科技有限责任公司 Power utilization safety sensing method
WO2022089190A1 (en) * 2020-11-02 2022-05-05 深圳壹账通智能科技有限公司 Product recommendation method and apparatus, and electronic device and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002169613A (en) * 2000-12-04 2002-06-14 Hitachi Ltd Analysis method for electric power load curve and system
CN106096805A (en) * 2016-05-10 2016-11-09 华北电力大学 A kind of residential electricity consumption load classification method based on entropy assessment feature selection
US20180285788A1 (en) * 2015-10-13 2018-10-04 British Gas Trading Limited System for energy consumption prediction
CN110069467A (en) * 2019-04-16 2019-07-30 沈阳工业大学 System peak load based on Pearson's coefficient and MapReduce parallel computation clusters extraction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002169613A (en) * 2000-12-04 2002-06-14 Hitachi Ltd Analysis method for electric power load curve and system
US20180285788A1 (en) * 2015-10-13 2018-10-04 British Gas Trading Limited System for energy consumption prediction
CN106096805A (en) * 2016-05-10 2016-11-09 华北电力大学 A kind of residential electricity consumption load classification method based on entropy assessment feature selection
CN110069467A (en) * 2019-04-16 2019-07-30 沈阳工业大学 System peak load based on Pearson's coefficient and MapReduce parallel computation clusters extraction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
段青;赵建国;罗珂;: "基于形状相似的日负荷曲线多重聚类分析及其应用" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861781A (en) * 2020-02-29 2020-10-30 上海电力大学 Feature optimization method and system in residential electricity consumption behavior clustering
CN111506635A (en) * 2020-05-11 2020-08-07 上海积成能源科技有限公司 System and method for analyzing residential electricity consumption behavior based on autoregressive naive Bayes algorithm
CN111783827A (en) * 2020-05-27 2020-10-16 中能瑞通(北京)科技有限公司 Enterprise user classification method and device based on load data
WO2022089190A1 (en) * 2020-11-02 2022-05-05 深圳壹账通智能科技有限公司 Product recommendation method and apparatus, and electronic device and readable storage medium
CN113743977A (en) * 2021-06-28 2021-12-03 国网上海市电力公司 User behavior-based electricity consumption data feature extraction method and system
CN114202011A (en) * 2021-10-30 2022-03-18 湖南江军科技有限责任公司 Power utilization safety sensing method
CN113872204A (en) * 2021-12-03 2021-12-31 全球能源互联网研究院有限公司 Power load property determination method and device based on power grid diagram topological calculation

Also Published As

Publication number Publication date
CN110825723B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
CN110825723B (en) Resident user classification method based on electricity load analysis
CN107800140B (en) Large user power supply access decision method considering load characteristics
CN108846530B (en) Short-term load prediction method based on clustering-regression model
Zhang et al. Analysis of power consumer behavior based on the complementation of K-means and DBSCAN
CN110188221B (en) Shape distance-based load curve hierarchical clustering method
CN110263873A (en) A kind of power distribution network platform area classification method merging sparse noise reduction autoencoder network dimensionality reduction and cluster
CN116388213A (en) Dynamic reactive power optimization method and system for active power distribution network containing new energy and charging station
CN110909786A (en) New user load identification method based on characteristic index and decision tree model
CN111324790A (en) Load type identification method based on support vector machine classification
CN114722098A (en) Typical load curve identification method based on normal cloud model and density clustering algorithm
CN117951619A (en) User electricity behavior analysis method and system based on outlier detection and k-means combination
Fatouh et al. New semi-supervised and active learning combination technique for non-intrusive load monitoring
Zhang et al. The power big data-based energy analysis for intelligent community in smart grid
Peng et al. A two-stage pattern recognition method for electric customer classification in smart grid
Lu et al. Research on creating multi-attribute power consumption behavior portraits for massive users
Wang et al. Optimization of clustering analysis of residential electricity consumption behavior
Luo et al. Long short-term power load forecasting algorithm using long short-term memory neural network with density-based spatial clustering
Wang et al. Analysis of user’s power consumption behavior based on k-means
Wang et al. Resident user load classification method based on improved Gaussian mixture model clustering
CN114186631A (en) Load identification method based on non-invasive intelligent terminal
RongQi et al. Research of Power User Load Classification Method Based on K-means and FSVM
Zhu et al. Double-Layer Improved K-Means Based Electricity Industry Classification Method
Zhu et al. Typical scene acquisition strategy for VPP based on multi-scale spectral clustering algorithm
Lu et al. Load identification based on optimized fuzzy C-means state extraction
Fan et al. Prediction and Analysis of Power User Energy Consumption Based on Demand Side Management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant