CN114357261A - Massive power consumer clustering algorithm based on data-physical characteristic combined drive - Google Patents

Massive power consumer clustering algorithm based on data-physical characteristic combined drive Download PDF

Info

Publication number
CN114357261A
CN114357261A CN202111281280.4A CN202111281280A CN114357261A CN 114357261 A CN114357261 A CN 114357261A CN 202111281280 A CN202111281280 A CN 202111281280A CN 114357261 A CN114357261 A CN 114357261A
Authority
CN
China
Prior art keywords
clustering
sample
cluster
data
clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111281280.4A
Other languages
Chinese (zh)
Inventor
钱斌
程韧俐
周密
祝宇翔
李富盛
史军
肖勇
刘傲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China South Power Grid International Co ltd
Shenzhen Power Supply Co ltd
Original Assignee
China South Power Grid International Co ltd
Shenzhen Power Supply Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China South Power Grid International Co ltd, Shenzhen Power Supply Co ltd filed Critical China South Power Grid International Co ltd
Priority to CN202111281280.4A priority Critical patent/CN114357261A/en
Publication of CN114357261A publication Critical patent/CN114357261A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a massive power consumer clustering algorithm based on data-physical characteristic combined drive, which comprises the following steps of: s1, clustering the daily typical load curves of the single users; s2, per-unit user daily typical load data is converted into per-unit values of the user daily typical loads; s3, clustering the daily typical load curves of multiple users; s4, automatically iterating, and searching the clustering number meeting the contour coefficient and the industry concentration standard; and S5, outputting the user clustering result corresponding to the optimal clustering number. According to the invention, based on the massive historical power consumption data of the users accumulated in the power consumption information acquisition system, the production characteristics and power consumption requirements of various industries and users are mined and mastered, so that the load prediction precision and the dispatching management level of a power grid company can be improved, and accurate data support and decision basis can be provided for power price formulation, economic dispatching, demand response and the like.

Description

Massive power consumer clustering algorithm based on data-physical characteristic combined drive
Technical Field
The invention relates to a massive power consumer clustering algorithm, in particular to a massive power consumer clustering algorithm based on data-physical characteristic combined driving.
Background
Based on the massive historical power consumption data of the users accumulated in the power consumption information acquisition system, the production characteristics and the power consumption requirements of various industries and users are mined and mastered, the load prediction precision and the dispatching management level of a power grid company can be improved, and accurate data support and decision basis can be provided for power price formulation, economic dispatching, demand response and the like.
Due to the massive nature of users, user clustering is a prerequisite and key influencing factor for the analysis and mining of users. The user clustering method is divided into two categories: the method is simple and visual and has strong physical significance, but as the attributes of the industries to which the users belong are often wrong and neglected, and the same industries also have sub-industries with different production characteristics, the clustering results are not uniform in electricity utilization characteristics; while the user clustering method based on the load characteristics can ensure the consistency of the load characteristics in the clustering result class, the physical explanation of the consistency of the load characteristics is lacked.
Disclosure of Invention
The invention aims to provide a massive power consumer clustering algorithm based on data-physical characteristic combined drive, which not only can improve the load prediction precision and the dispatching management level of a power grid company, but also can provide accurate data support and decision basis for power price formulation, economic dispatching, demand response and the like.
The technical scheme adopted by the invention for solving the technical problems is as follows: a massive power consumer clustering algorithm based on data-physical characteristic combined drive is constructed, and the method comprises the following steps:
s1, clustering the daily typical load curves of the single users;
s2, per-unit user daily typical load data is converted into per-unit values of the user daily typical loads;
s3, clustering the daily typical load curves of multiple users;
s4, automatically iterating, and searching the clustering number meeting the contour coefficient and the industry concentration standard;
and S5, outputting the user clustering result corresponding to the optimal clustering number.
According to the scheme, in step S1, the single-user daily typical load curve clustering is to cluster the user daily typical load curves by using a hierarchical clustering method, and calculate the user typical daily load curves according to the category including the most daily load curves.
According to the scheme, the hierarchical clustering method specifically comprises the following steps: by adopting a top-down mode, each point of all samples is regarded as a cluster, then two clusters with the minimum distance are found out and combined, and the expected clusters are continuously repeated, wherein a representative algorithm of the hierarchical clustering method comprises the following steps: AGES, Aglometric nestling.
According to the scheme, in the step S3, the value range of the selected clustering category number is N/2-2N, the hierarchical clustering method is utilized to perform multi-user daily typical load curve clustering, and meanwhile, the profile coefficient and the industry concentration ratio of each category under different clustering numbers are calculated.
According to the scheme, the specific calculation method of the contour coefficient is as follows:
s101, calculating the average distance ai between a sample i and other samples in the same cluster, wherein the smaller ai is, the more the sample i is clustered to the cluster, ai is called the intra-cluster dissimilarity of the sample i, and the a i mean value of all samples in the cluster C is called the cluster dissimilarity of the cluster C;
s102, calculating the average distance bij from the sample i to all samples of other clusters Cj, wherein the average distance bij is called the dissimilarity between the sample i and the clusters Cj and is defined as the dissimilarity between clusters of the sample i:
bi=min{bi1,bi2,...,bik}
the larger bi is, the less sample i belongs to other clusters;
s103, defining a contour coefficient of the sample i according to the intra-cluster dissimilarity a i and the inter-cluster dissimilarity b i of the sample i:
Figure RE-GDA0003522648350000031
Figure RE-GDA0003522648350000032
s104, judging:
if si is close to 1, the clustering of the sample i is reasonable;
if si is close to-1, it indicates that sample i should be more classified into another cluster;
if si is approximately 0, it indicates that sample i is on the boundary of two clusters;
s105, the mean value of si of all samples is called the contour coefficient of the clustering result, and is a measure for judging whether the clustering is reasonable and effective.
According to the scheme, in the step S4, an automatic iterative convergence algorithm is used for judging whether the profile coefficient and the industry concentration ratio of each clustering meet the index standard and converge, and if not, the clustering number is increased, and the iterative calculation is continued.
The implementation of the massive power consumer clustering algorithm based on the data-physical characteristic combined drive has the following beneficial effects:
1. the method carries out massive power user clustering through data-physical characteristic combined driving, and clusters user load curves based on a daily typical load curve hierarchical clustering method of profile coefficients to obtain clustering results meeting the profile coefficient requirements and industry concentration degree standards;
2. the method is based on a massive power consumer clustering algorithm driven by data-physical characteristics in a combined manner, and the clustering result obtained by calculation through the method reaches the standard of a clustering effect evaluation index and the standard of a clustering class number constraint index;
3. the invention is based on a massive power consumer clustering algorithm driven by data-physical characteristics in a combined manner, can solve the problem of clustering massive power consumers, and can ensure consistency of power utilization characteristics and load characteristics.
Drawings
FIG. 1 is a flow chart of an algorithm provided by the present invention;
FIG. 2 is a correlation curve between the average contour coefficient of a large number of users and the number of clusters provided by the present invention;
FIG. 3 is a corresponding dominant industry proportion condition under different profile coefficients;
FIG. 4 is an example of hierarchical clustering according to the present invention.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
As shown in fig. 1 to 4, the massive power consumer clustering algorithm based on data-physical characteristic joint driving of the present invention includes the following steps:
s1, clustering the daily typical load curves of the single users, specifically as follows:
and (3) adopting a bottom-up hierarchical clustering method, taking all the single-user daily load curves as a whole, regarding each sample as a cluster, then finding out two clusters with the minimum distance, merging, continuously repeating to meet the requirement of an expected cluster, and calculating the typical daily load curve of the user according to the category containing the maximum daily load curve.
S2, per unit user daily typical load data, calculating per unit value of user daily typical load.
S3, clustering daily typical load curves of thousands of users in a certain area by using a hierarchical clustering method, and counting to obtain a relation curve graph of an average profile coefficient and the clustering number.
The value range of the selected clustering category number is 50-250, multi-user daily typical load curve clustering is carried out by utilizing a hierarchical clustering method, and meanwhile, the profile coefficient and the industry concentration ratio of each category under different clustering numbers are calculated.
The contour coefficient is an evaluation mode for measuring the cluster effect, and the specific calculation method of the contour coefficient is as follows:
s101, calculating the average distance ai from the sample i to other samples in the same cluster. The smaller ai is, the more sample i should be clustered into the cluster, ai is called intra-cluster dissimilarity of sample i, and the a i mean of all samples in cluster C is called cluster dissimilarity of cluster C.
S102, calculating the average distance bij from the sample i to all samples of other clusters Cj, wherein the average distance bij is called the dissimilarity between the sample i and the clusters Cj and is defined as the dissimilarity between clusters of the sample i:
bi=min{bi1,bi2,...,bik}
the larger bi, the less sample i belongs to other clusters.
S103, defining a contour coefficient of the sample i according to the intra-cluster dissimilarity a i and the inter-cluster dissimilarity b i of the sample i:
Figure RE-GDA0003522648350000051
namely, it is
Figure RE-GDA0003522648350000052
S104, judging:
if si is close to 1, the clustering of the sample i is reasonable;
if si is close to-1, it indicates that sample i should be more classified into another cluster;
if si is approximately 0, it indicates that sample i is on the boundary of two clusters.
S1105, the mean value of S i of all samples is called the contour coefficient of the clustering result, and is a measure of whether the clustering is reasonable and effective.
As the number of clusters increases, the average contour coefficient shows a slowly increasing trend, and the curve begins to become flat in the middle.
And S4, automatically iterating, and searching the clustering number meeting the contour coefficient and the industry concentration standard.
And judging whether the profile coefficient and the industry concentration ratio of each clustering meet the index standard and are converged by using an automatic iterative convergence algorithm, and if not, increasing the clustering number and continuing iterative calculation.
And S5, outputting the user clustering result corresponding to the optimal clustering number.
When the contour coefficient is close to 0.095, the first line occupation ratio reaches 55%, the second line occupation ratio reaches 80%, the first line occupation ratio exceeds 90%, and the point is a stage inflection point of the curve, so that the clustering number of the point is suitable to be selected as the clustering number of the daily typical load curve clustering. Querying the data yields the cluster number as 87, which is the optimal cluster number.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (6)

1. A massive power consumer clustering algorithm based on data-physical characteristic combined driving is characterized by comprising the following steps:
s1, clustering the daily typical load curves of the single users;
s2, per-unit user daily typical load data is converted into per-unit values of the user daily typical loads;
s3, clustering the daily typical load curves of multiple users;
s4, automatically iterating, and searching the clustering number meeting the contour coefficient and the industry concentration standard;
and S5, outputting the user clustering result corresponding to the optimal clustering number.
2. The algorithm for clustering the massive power consumers based on the data-physical characteristic joint driving according to claim 1, wherein in step S1, the single-consumer daily typical load curves are clustered, that is, the consumer daily typical load curves are clustered by using a hierarchical clustering method, and the consumer typical daily load curves are calculated according to the category including the most daily load curves.
3. The massive power consumer clustering algorithm based on data-physical characteristic joint driving according to claim 2, wherein the hierarchical clustering method specifically comprises the following steps: by adopting a top-down mode, each point of all samples is regarded as a cluster, then two clusters with the minimum distance are found out and combined, and the expected clusters are continuously repeated, wherein a representative algorithm of the hierarchical clustering method comprises the following steps: AGES, Aglometric nestling.
4. The massive power consumer clustering algorithm based on data-physical characteristic joint driving according to claim 1, wherein in the step S3, the value range of the number of the cluster categories is selected to be N/2 to 2N, the hierarchical clustering method is used for clustering the daily typical load curve of multiple users, and the profile coefficient and the industry concentration ratio of each category under different cluster numbers are calculated at the same time.
5. The massive power consumer clustering algorithm based on data-physical characteristic joint driving according to claim 4, wherein the specific calculation method of the contour coefficient is as follows:
s101, calculating the average distance ai between a sample i and other samples in the same cluster, wherein the smaller ai is, the more the sample i is clustered to the cluster, ai is called the intra-cluster dissimilarity of the sample i, and the ai mean value of all samples in the cluster C is called the cluster dissimilarity of the cluster C;
s102, calculating the average distance bij from the sample i to all samples of other clusters Cj, wherein the average distance bij is called the dissimilarity between the sample i and the clusters Cj and is defined as the dissimilarity between clusters of the sample i:
bi=min{bi1,bi2,...,bik}
the larger bi is, the less sample i belongs to other clusters;
s103, defining a contour coefficient of the sample i according to the intra-cluster dissimilarity ai and the inter-cluster dissimilarity bi of the sample i:
Figure RE-FDA0003543459930000021
Figure RE-FDA0003543459930000022
s104, judging:
if si is close to 1, the clustering of the sample i is reasonable;
if si is close to-1, it indicates that sample i should be more classified into another cluster;
if si is approximately 0, it indicates that sample i is on the boundary of two clusters;
s105, the mean value of si of all samples is called the contour coefficient of the clustering result, and is a measure for judging whether the clustering is reasonable and effective.
6. The massive power consumer clustering algorithm based on data-physical characteristic joint driving according to claim 1, wherein in step S4, an automatic iterative convergence algorithm is used to determine whether the profile coefficient and the industry concentration of each clustering meet the index standard and converge, and if not, the number of clusters is increased, and iterative computation is continued.
CN202111281280.4A 2021-11-01 2021-11-01 Massive power consumer clustering algorithm based on data-physical characteristic combined drive Pending CN114357261A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111281280.4A CN114357261A (en) 2021-11-01 2021-11-01 Massive power consumer clustering algorithm based on data-physical characteristic combined drive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111281280.4A CN114357261A (en) 2021-11-01 2021-11-01 Massive power consumer clustering algorithm based on data-physical characteristic combined drive

Publications (1)

Publication Number Publication Date
CN114357261A true CN114357261A (en) 2022-04-15

Family

ID=81095658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111281280.4A Pending CN114357261A (en) 2021-11-01 2021-11-01 Massive power consumer clustering algorithm based on data-physical characteristic combined drive

Country Status (1)

Country Link
CN (1) CN114357261A (en)

Similar Documents

Publication Publication Date Title
CN110264107B (en) Large data technology-based abnormal diagnosis method for line loss rate of transformer area
CN108846530B (en) Short-term load prediction method based on clustering-regression model
CN110825723B (en) Resident user classification method based on electricity load analysis
CN111932069A (en) Household power consumer electricity utilization efficiency analysis method, computer equipment and storage medium
CN112819299A (en) Differential K-means load clustering method based on center optimization
CN114004296A (en) Method and system for reversely extracting monitoring points based on power load characteristics
CN111553568A (en) Line loss management method based on data mining technology
CN111489188A (en) Resident adjustable load potential mining method and system
CN103632306A (en) Distribution network power supply area division method based on clustering analysis
CN117113126A (en) Industry electricity utilization characteristic analysis method based on improved clustering algorithm
CN111062539B (en) Total electric quantity prediction method based on secondary electric quantity characteristic cluster analysis
CN116049705A (en) Clustering analysis-based power system user load characteristic clustering method and system
CN110909786A (en) New user load identification method based on characteristic index and decision tree model
CN113947444B (en) Electricity selling package recommendation method considering multi-granularity hesitation fuzzy set and incomplete weight
CN115994778A (en) Behavior fine portrait method for multiple users
CN117559443A (en) Ordered power utilization control method for large industrial user cluster under peak load
CN114565293A (en) Evaluation method for providing long-period demand response capability by industrial load
Prahastono et al. A review of electricity load profile classification methods
CN111898857A (en) BEMD and kmeans-based power user characteristic analysis method and system
CN114357261A (en) Massive power consumer clustering algorithm based on data-physical characteristic combined drive
Chen et al. Rule induction-based knowledge discovery for energy efficiency
CN115829418A (en) Power consumer load characteristic portrait construction method and system suitable for load management
Wang et al. Analysis of user’s power consumption behavior based on k-means
CN114638284A (en) Power utilization behavior characterization method considering external influence factors
CN114037161A (en) Modeling method of retail electricity price package optimization model for maximizing social welfare

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination