CN114357261A

CN114357261A - Massive power consumer clustering algorithm based on data-physical characteristic combined drive

Info

Publication number: CN114357261A
Application number: CN202111281280.4A
Authority: CN
Inventors: 钱斌; 程韧俐; 周密; 祝宇翔; 李富盛; 史军; 肖勇; 刘傲
Original assignee: China South Power Grid International Co ltd; Shenzhen Power Supply Co ltd
Current assignee: China South Power Grid International Co ltd; Shenzhen Power Supply Co ltd
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2022-04-15

Abstract

The invention discloses a massive power consumer clustering algorithm based on data-physical characteristic combined drive, which comprises the following steps of: s1, clustering the daily typical load curves of the single users; s2, per-unit user daily typical load data is converted into per-unit values of the user daily typical loads; s3, clustering the daily typical load curves of multiple users; s4, automatically iterating, and searching the clustering number meeting the contour coefficient and the industry concentration standard; and S5, outputting the user clustering result corresponding to the optimal clustering number. According to the invention, based on the massive historical power consumption data of the users accumulated in the power consumption information acquisition system, the production characteristics and power consumption requirements of various industries and users are mined and mastered, so that the load prediction precision and the dispatching management level of a power grid company can be improved, and accurate data support and decision basis can be provided for power price formulation, economic dispatching, demand response and the like.

Description

Massive power consumer clustering algorithm based on data-physical characteristic combined drive

Technical Field

The invention relates to a massive power consumer clustering algorithm, in particular to a massive power consumer clustering algorithm based on data-physical characteristic combined driving.

Background

Based on the massive historical power consumption data of the users accumulated in the power consumption information acquisition system, the production characteristics and the power consumption requirements of various industries and users are mined and mastered, the load prediction precision and the dispatching management level of a power grid company can be improved, and accurate data support and decision basis can be provided for power price formulation, economic dispatching, demand response and the like.

Due to the massive nature of users, user clustering is a prerequisite and key influencing factor for the analysis and mining of users. The user clustering method is divided into two categories: the method is simple and visual and has strong physical significance, but as the attributes of the industries to which the users belong are often wrong and neglected, and the same industries also have sub-industries with different production characteristics, the clustering results are not uniform in electricity utilization characteristics; while the user clustering method based on the load characteristics can ensure the consistency of the load characteristics in the clustering result class, the physical explanation of the consistency of the load characteristics is lacked.

Disclosure of Invention

The invention aims to provide a massive power consumer clustering algorithm based on data-physical characteristic combined drive, which not only can improve the load prediction precision and the dispatching management level of a power grid company, but also can provide accurate data support and decision basis for power price formulation, economic dispatching, demand response and the like.

The technical scheme adopted by the invention for solving the technical problems is as follows: a massive power consumer clustering algorithm based on data-physical characteristic combined drive is constructed, and the method comprises the following steps:

s1, clustering the daily typical load curves of the single users;

s2, per-unit user daily typical load data is converted into per-unit values of the user daily typical loads;

s3, clustering the daily typical load curves of multiple users;

s4, automatically iterating, and searching the clustering number meeting the contour coefficient and the industry concentration standard;

and S5, outputting the user clustering result corresponding to the optimal clustering number.

According to the scheme, in step S1, the single-user daily typical load curve clustering is to cluster the user daily typical load curves by using a hierarchical clustering method, and calculate the user typical daily load curves according to the category including the most daily load curves.

According to the scheme, the hierarchical clustering method specifically comprises the following steps: by adopting a top-down mode, each point of all samples is regarded as a cluster, then two clusters with the minimum distance are found out and combined, and the expected clusters are continuously repeated, wherein a representative algorithm of the hierarchical clustering method comprises the following steps: AGES, Aglometric nestling.

According to the scheme, in the step S3, the value range of the selected clustering category number is N/2-2N, the hierarchical clustering method is utilized to perform multi-user daily typical load curve clustering, and meanwhile, the profile coefficient and the industry concentration ratio of each category under different clustering numbers are calculated.

According to the scheme, the specific calculation method of the contour coefficient is as follows:

s101, calculating the average distance ai between a sample i and other samples in the same cluster, wherein the smaller ai is, the more the sample i is clustered to the cluster, ai is called the intra-cluster dissimilarity of the sample i, and the a i mean value of all samples in the cluster C is called the cluster dissimilarity of the cluster C;

s102, calculating the average distance bij from the sample i to all samples of other clusters Cj, wherein the average distance bij is called the dissimilarity between the sample i and the clusters Cj and is defined as the dissimilarity between clusters of the sample i:

bi＝min{bi1,bi2,...,bik}

the larger bi is, the less sample i belongs to other clusters;

s103, defining a contour coefficient of the sample i according to the intra-cluster dissimilarity a i and the inter-cluster dissimilarity b i of the sample i:

s104, judging:

if si is close to 1, the clustering of the sample i is reasonable;

if si is close to-1, it indicates that sample i should be more classified into another cluster;

if si is approximately 0, it indicates that sample i is on the boundary of two clusters;

s105, the mean value of si of all samples is called the contour coefficient of the clustering result, and is a measure for judging whether the clustering is reasonable and effective.

According to the scheme, in the step S4, an automatic iterative convergence algorithm is used for judging whether the profile coefficient and the industry concentration ratio of each clustering meet the index standard and converge, and if not, the clustering number is increased, and the iterative calculation is continued.

The implementation of the massive power consumer clustering algorithm based on the data-physical characteristic combined drive has the following beneficial effects:

1. the method carries out massive power user clustering through data-physical characteristic combined driving, and clusters user load curves based on a daily typical load curve hierarchical clustering method of profile coefficients to obtain clustering results meeting the profile coefficient requirements and industry concentration degree standards;

2. the method is based on a massive power consumer clustering algorithm driven by data-physical characteristics in a combined manner, and the clustering result obtained by calculation through the method reaches the standard of a clustering effect evaluation index and the standard of a clustering class number constraint index;

3. the invention is based on a massive power consumer clustering algorithm driven by data-physical characteristics in a combined manner, can solve the problem of clustering massive power consumers, and can ensure consistency of power utilization characteristics and load characteristics.

Drawings

FIG. 1 is a flow chart of an algorithm provided by the present invention;

FIG. 2 is a correlation curve between the average contour coefficient of a large number of users and the number of clusters provided by the present invention;

FIG. 3 is a corresponding dominant industry proportion condition under different profile coefficients;

FIG. 4 is an example of hierarchical clustering according to the present invention.

Detailed Description

For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

As shown in fig. 1 to 4, the massive power consumer clustering algorithm based on data-physical characteristic joint driving of the present invention includes the following steps:

s1, clustering the daily typical load curves of the single users, specifically as follows:

and (3) adopting a bottom-up hierarchical clustering method, taking all the single-user daily load curves as a whole, regarding each sample as a cluster, then finding out two clusters with the minimum distance, merging, continuously repeating to meet the requirement of an expected cluster, and calculating the typical daily load curve of the user according to the category containing the maximum daily load curve.

S2, per unit user daily typical load data, calculating per unit value of user daily typical load.

S3, clustering daily typical load curves of thousands of users in a certain area by using a hierarchical clustering method, and counting to obtain a relation curve graph of an average profile coefficient and the clustering number.

The value range of the selected clustering category number is 50-250, multi-user daily typical load curve clustering is carried out by utilizing a hierarchical clustering method, and meanwhile, the profile coefficient and the industry concentration ratio of each category under different clustering numbers are calculated.

The contour coefficient is an evaluation mode for measuring the cluster effect, and the specific calculation method of the contour coefficient is as follows:

s101, calculating the average distance ai from the sample i to other samples in the same cluster. The smaller ai is, the more sample i should be clustered into the cluster, ai is called intra-cluster dissimilarity of sample i, and the a i mean of all samples in cluster C is called cluster dissimilarity of cluster C.

bi＝min{bi1,bi2,...,bik}

the larger bi, the less sample i belongs to other clusters.

namely, it is

S104, judging:

if si is close to 1, the clustering of the sample i is reasonable;

if si is approximately 0, it indicates that sample i is on the boundary of two clusters.

S1105, the mean value of S i of all samples is called the contour coefficient of the clustering result, and is a measure of whether the clustering is reasonable and effective.

As the number of clusters increases, the average contour coefficient shows a slowly increasing trend, and the curve begins to become flat in the middle.

And S4, automatically iterating, and searching the clustering number meeting the contour coefficient and the industry concentration standard.

And judging whether the profile coefficient and the industry concentration ratio of each clustering meet the index standard and are converged by using an automatic iterative convergence algorithm, and if not, increasing the clustering number and continuing iterative calculation.

When the contour coefficient is close to 0.095, the first line occupation ratio reaches 55%, the second line occupation ratio reaches 80%, the first line occupation ratio exceeds 90%, and the point is a stage inflection point of the curve, so that the clustering number of the point is suitable to be selected as the clustering number of the daily typical load curve clustering. Querying the data yields the cluster number as 87, which is the optimal cluster number.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A massive power consumer clustering algorithm based on data-physical characteristic combined driving is characterized by comprising the following steps:

s1, clustering the daily typical load curves of the single users;

s3, clustering the daily typical load curves of multiple users;

2. The algorithm for clustering the massive power consumers based on the data-physical characteristic joint driving according to claim 1, wherein in step S1, the single-consumer daily typical load curves are clustered, that is, the consumer daily typical load curves are clustered by using a hierarchical clustering method, and the consumer typical daily load curves are calculated according to the category including the most daily load curves.

3. The massive power consumer clustering algorithm based on data-physical characteristic joint driving according to claim 2, wherein the hierarchical clustering method specifically comprises the following steps: by adopting a top-down mode, each point of all samples is regarded as a cluster, then two clusters with the minimum distance are found out and combined, and the expected clusters are continuously repeated, wherein a representative algorithm of the hierarchical clustering method comprises the following steps: AGES, Aglometric nestling.

4. The massive power consumer clustering algorithm based on data-physical characteristic joint driving according to claim 1, wherein in the step S3, the value range of the number of the cluster categories is selected to be N/2 to 2N, the hierarchical clustering method is used for clustering the daily typical load curve of multiple users, and the profile coefficient and the industry concentration ratio of each category under different cluster numbers are calculated at the same time.

5. The massive power consumer clustering algorithm based on data-physical characteristic joint driving according to claim 4, wherein the specific calculation method of the contour coefficient is as follows:

s101, calculating the average distance ai between a sample i and other samples in the same cluster, wherein the smaller ai is, the more the sample i is clustered to the cluster, ai is called the intra-cluster dissimilarity of the sample i, and the ai mean value of all samples in the cluster C is called the cluster dissimilarity of the cluster C;

bi＝min{bi1,bi2,...,bik}

the larger bi is, the less sample i belongs to other clusters;

s103, defining a contour coefficient of the sample i according to the intra-cluster dissimilarity ai and the inter-cluster dissimilarity bi of the sample i:

s104, judging:

if si is close to 1, the clustering of the sample i is reasonable;

6. The massive power consumer clustering algorithm based on data-physical characteristic joint driving according to claim 1, wherein in step S4, an automatic iterative convergence algorithm is used to determine whether the profile coefficient and the industry concentration of each clustering meet the index standard and converge, and if not, the number of clusters is increased, and iterative computation is continued.