CN113222366A

CN113222366A - Power utilization reliability evaluation method of self-adaptive k-means clustering algorithm

Info

Publication number: CN113222366A
Application number: CN202110460917.XA
Authority: CN
Inventors: 曾健; 秦丽文; 桂海涛; 吴茵; 李任明; 吴凡; 阳国燕; 程向辉; 韦营
Original assignee: Guilin Power Supply Bureau of Guangxi Power Grid Co Ltd
Current assignee: Guilin Power Supply Bureau of Guangxi Power Grid Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-08-06

Abstract

The invention provides a power utilization reliability evaluation method of a self-adaptive k-means clustering algorithm, which comprises the following steps: acquiring user data; determining a maximum number of clusters k_maxAnd a minimum cluster center number k_min(ii) a Let k be k as the number of clustering centers_minClustering the user data; number k in maximum cluster center_maxAnd a minimum cluster center number k_minDetermining a clustering center number k value; selecting the best clustering center number K₀And (5) obtaining the electricity utilization characteristics of the user according to the clustering result under the value. The invention determines the maximum clustering center number k first_maxAnd a minimum cluster center number k_minCalculating a data clustering effect evaluation index I corresponding to the k value of the clustering center number_DBIValue-wise determination of the best number of clusters K₀The mode of value, the processing range is relativelyLarge data, simple and quick determination of the best cluster center number K₀The method solves the defect that the traditional k-means clustering algorithm cannot assign the clustering center number in large-range data through experience.

Description

Power utilization reliability evaluation method of self-adaptive k-means clustering algorithm

Technical Field

The invention relates to the field of data processing, in particular to a power utilization reliability evaluation method of a self-adaptive k-means clustering algorithm.

Background

The operation reliability of the power system is up to the safety of the national civilization and the country, the operation reliability of the power system is accurately evaluated, and a targeted guidance suggestion can be provided for system maintenance. The existing reliability research is mostly focused on a power system layer or a user individual layer, and the most application in the aspect is to perform dimension reduction and clustering to a certain extent on the power utilization data of the user layer by using a k-means method so as to realize the reliability analysis of the user type.

However, the method has certain limitations, the method needs a manually specified clustering center number k, when the application range is gradually enlarged, the clustering center number cannot be specified through experience, and the effect of the method is greatly influenced. In addition, currently, reliability assessment of a user layer is always stopped in user load clustering, research results are more applied to a marketing system to guide user service, the running state of the system is difficult to reflect, power grid planning and reliability assessment work cannot be guided, and a large improvement space still exists.

Disclosure of Invention

A power utilization reliability evaluation method of a self-adaptive k-means clustering algorithm comprises the following steps:

step S1, user data is obtained;

step S2, determining the maximum clustering center number k_maxAnd a minimum cluster center number k_min；

Step S3, let the number k of cluster centers be k_minClustering the user data;

step S4, counting the maximum clustering center number k_maxAnd a minimum cluster center number k_minDetermine the best clustering center number K₀A value;

step S5, selecting the best clustering center number K₀And (5) obtaining the electricity utilization characteristics of the user according to the clustering result under the value.

Further, the user data specifically includes: the power utilization curve of the user, the account information of the user and the work order information of the power grid fault.

Further, the step S4 specifically includes:

step S401, judging whether the value of the clustering center number k is smaller than the maximum clustering center number k_max；

Step S402, if the value of the clustering center number k is less than the maximum clustering center number k_maxThen calculate the data clustering effect evaluation index I_DBIValue, let cluster center number k +1, and return to step S3;

step S403, if the value of the clustering center number k is more than or equal to k_maxThen, the minimum data clustering effect evaluation index I calculated in step S402 is selected_DBICorresponding best cluster center number K₀The value is obtained.

Further, the data clustering effect evaluation index I_DBIIs an evaluation index of the data cluster, the number of centers of the best cluster K₀Corresponding minimum data clustering effect evaluation index I_DBI。

Further, the data clustering effect evaluation index I_DBIThe calculation formula of the value is:

wherein:

d_jnumber of centers of class j representing arbitrary selectionAccording to the average distance from the object to the corresponding class center;

d_hrepresenting the average distance between the data object in the randomly selected h-type class center number and the corresponding class center;

d_j,hand representing the Euclidean distance of class centers of the arbitrarily selected class j center number and the arbitrarily selected class h center number.

Further, the user electricity utilization characteristics are combined with the fault work order information to perform reliability analysis, and an electricity utilization reliability index is obtained.

Further, the power utilization reliability indexes comprise average power failure frequency, average power failure duration, expected number of users in power failure, average power failure shortage amount and power failure reason probability distribution.

Further, the user electricity consumption curve is obtained by averaging the same type of user electricity consumption, the user electricity consumption curve is week electricity consumption data of one year of the user, and the week electricity consumption data is obtained by selecting row number data of each user every 7 days to perform cleaning and differential operation.

Further, the length of the weekly electricity consumption curve is divided into 52 points, and the weekly electricity consumption data formed by each user every year form a 52-dimensional vector.

Further, the total vector quantity of the electricity consumption of the users is obtained according to a vector formed by the weekly electricity consumption data formed by each user every year, and the total vector quantity of the electricity consumption of the users is clustered by an algorithm.

Drawings

FIG. 1 is a schematic flow chart of an adaptive k-means clustering algorithm in the present invention;

FIG. 2 is a diagram showing an evaluation index I of data clustering effect in the present invention_DBIA curve that varies with the number k of cluster centers;

fig. 3 shows the clustering result when the number k of the clustering centers is 14 in the present invention.

Detailed Description

The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

Example one

A power utilization reliability assessment method of an adaptive k-means clustering algorithm is shown in figure 1 and comprises the following steps:

step S1, user data is obtained;

selecting user row data of a certain city in south China for explanation, wherein the time span is from 2019-09-30 to 2020-10-01, 54148362 data are obtained, 892421 users are contained, and a user electricity utilization curve is obtained through cleaning, and each user has 52 points. And (4) screening abnormal curves of the cleaned data to obtain 889905 effective user electricity utilization curves.

As shown in FIG. 2, the maximum cluster center number k is determined_max19 and the minimum cluster center number k_min＝2

Step S3, let the number k of cluster centers be k_minStarting to cluster the user data;

calculating data clustering effect evaluation index I_DBIWhen the value of k is less than or equal to k in the clustering center number_maxReturning to the step 3, otherwise, calculatingMinimum data clustering effect evaluation index I_DBIThe corresponding cluster center number k. In this example, the data clustering effect evaluation index I_DBIThe curve as a function of the number k of clusters is shown in FIG. 2.

As can be seen from fig. 2, when the number k of clusters is 14, the smallest data clustering effect evaluation index I appears_DBIThe value is obtained. Selecting the best clustering center number K₀Clustering was repeated 14, as shown in fig. 3, resulting in 14 cluster centers representing 14 typical annual power curve types.

In a preferred embodiment of the present application, the adaptive k-means clustering algorithm is implemented by first determining the maximum clustering center number k_maxAnd a minimum cluster center number k_minDetermine the optimal number of clusters K₀The method can process data with a large range, and solves the defect that the traditional k-means clustering algorithm cannot assign the clustering center number in the large-range data through experience.

Further, in a preferred embodiment of the present application, the user data specifically includes: the power utilization curve of the user, the account information of the user and the work order information of the power grid fault.

In the application, the characteristics of the electricity consumption types of the typical users are obtained by counting the typical electricity consumption curve, the number of the users of various typical types and the proportion of the electricity consumption, and the electricity consumption characteristics of the global users can be comprehensively mastered according to the counting method.

Further, in a preferred embodiment of the present application, the step S4 specifically includes:

step S403, as describedThe value of the clustering center number k is greater than or equal to the maximum clustering center number k_maxThen, the minimum data clustering effect evaluation index I calculated in step S402 is selected_DBICorresponding best cluster center number K₀The value is obtained.

Further, in a preferred embodiment of the present application, the optimal number of clusters K is₀Corresponding minimum data clustering effect evaluation index I_DBI。

wherein:

d_jthe average distance from the data object in the jth class to the corresponding class center;

d_hrepresenting the average distance from the data object in the h class to the center of the corresponding class;

d_j,hrepresenting the euclidean distance of class centers for class j and class h.

Further, in a preferred embodiment of the present application, the user power utilization characteristics are combined with the fault work order information to perform reliability analysis, so as to obtain a power utilization reliability index.

As shown in FIG. 3, the abscissa is the number of weeks, ranging from 1 to 52, the first week representing 2019-10-1 to 2019-10-7, and so on. The ordinate is the electricity consumption in degrees. In addition, the clustering center to which each user belongs can be obtained through clustering, and the clustering center can be used as an annual power consumption curve corresponding to the typical user power consumption type and applied to subsequent power consumption reliability evaluation.

Further, in a preferred embodiment of the present application, the power utilization reliability index includes an average power outage frequency, an average power outage duration, the number of households expected to have power outage, an average power outage shortage amount, and a power outage cause probability distribution; wherein:

wherein λ_iThe number of power failure times within one year for the user i, N_RThe total number of the users of the type is R, and the R is a user set belonging to the same type;

wherein t is_iThe fault duration of the ith fault is respectively, and R is a user fault event set belonging to the same type;

the expected number of the users in the power failure is equal to the average power failure frequency multiplied by the average duration time of the power failure multiplied by the number of the users;

the average power outage amount is equal to the average power outage duration time multiplied by the average power of users;

and the power failure reason probability distribution is obtained by screening and counting the power grid fault first-aid repair work orders.

Further, in a preferred embodiment of this application, the user power consumption curve is obtained through averaging the user power consumption of the same type, the user power consumption curve is the week power consumption data of one year of the user, week power consumption data is obtained through washing and difference operation every 7 days by selecting a row number data of every user.

Further, in a preferred embodiment of the present application, the weekly power usage curve length is divided into 52 points, and the weekly power usage data formed each year by each user constitutes a 52-dimensional vector.

Further, in a preferred embodiment of the present application, the total vector number of the power consumption of the users is obtained according to a vector formed by the weekly power consumption data formed every year by each user, and the total vector number of the power consumption of the users is clustered by an algorithm. The total vector number is calculated in the embodiment as follows:

the total vector number is m × 52, where m is the total number of users.

In the description of the present invention, it is to be understood that the terms "intermediate", "length", "upper", "lower", "front", "rear", "vertical", "horizontal", "inner", "outer", "radial", "circumferential", and the like, indicate orientations and positional relationships that are based on the orientations and positional relationships shown in the drawings, are used for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and therefore, are not to be construed as limiting the present invention.

In the present invention, unless otherwise expressly stated or limited, the first feature may be "on" the second feature in direct contact with the second feature, or the first and second features may be in indirect contact via an intermediate. "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; may be mechanically coupled, may be electrically coupled or may be in communication with each other; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The above description is for the purpose of illustrating embodiments of the invention and is not intended to limit the invention, and it will be apparent to those skilled in the art that any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the invention shall fall within the protection scope of the invention.

Claims

1. A power utilization reliability evaluation method of a self-adaptive k-means clustering algorithm is characterized by comprising the following steps of:

step S1, user data is obtained;

Step S3, let the number k of cluster centers be k_minClustering the user data;

2. The method for evaluating the electricity utilization reliability of the adaptive k-means clustering algorithm according to claim 1, wherein the user data specifically comprises: the power utilization curve of the user, the account information of the user and the work order information of the power grid fault.

3. The power utilization reliability assessment method of the adaptive k-means clustering algorithm according to claim 1, wherein the step S4 specifically comprises:

step S403, if the value of the clustering center number k is larger than or equal to the maximum clustering center number k_maxThen, the minimum data clustering effect evaluation index I calculated in step S402 is selected_DBICorresponding best cluster center number K₀The value is obtained.

4. The method for evaluating the electricity utilization reliability of the adaptive K-means clustering algorithm according to claim 2, wherein the optimal clustering center number K is₀Corresponding minimum data clustering effect evaluation index I_DBI。

5. The method of claim 3, wherein the power consumption of the adaptive k-means clustering algorithm is zeroThe reliability evaluation method is characterized in that the data clustering effect evaluation index I_DBIThe calculation formula of the value is:

wherein:

d_jrepresenting the average distance between the data object in the arbitrarily selected j category center number and the corresponding category center;

6. The power utilization reliability assessment method of the self-adaptive k-means clustering algorithm according to claim 1, characterized in that the user power utilization characteristics are combined with the fault work order information to perform reliability analysis to obtain a power utilization reliability index.

7. The method as claimed in claim 6, wherein the electricity reliability indicators include average outage frequency, average outage duration, expected number of users in outage, average outage power supply shortage and outage cause probability distribution.

8. The power utilization reliability assessment method for the adaptive k-means clustering algorithm according to claim 2, characterized in that the user power utilization curve is obtained by averaging the same type of user power utilization, the user power utilization curve is weekly power utilization data of one year of the user, and the weekly power utilization data is obtained by selecting a row number data every 7 days for each user to perform cleaning and differential operation.

9. The power utilization reliability assessment method of the adaptive k-means clustering algorithm according to claim 8, wherein the length of the power utilization curve of the users is divided into 52 points, and the weekly power utilization data formed by each user every year form a 52-dimensional vector.

10. The method as claimed in claim 9, wherein the total vector number of the user electricity consumption is obtained from the vector formed by the weekly electricity consumption data of each user every year, and the algorithm clusters the total vector number of the user electricity consumption.