CN114611976A

CN114611976A - Power consumer behavior portrait method, system and device

Info

Publication number: CN114611976A
Application number: CN202210288846.4A
Authority: CN
Inventors: 林文浩; 姜绍艳; 简玮侠; 谢东霖; 张永亮; 熊力; 陈昱; 夏曼; 梁丽丽; 产启中; 林尔迅
Original assignee: Guangdong Power Grid Co Ltd; Zhongshan Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Zhongshan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-06-10

Abstract

The invention relates to the technical field of customer management in the power industry and discloses a method, a system and a device for representing power consumer behaviors. The method comprises the steps of correcting and normalizing load data of power consumers, clustering the processed data by using a Canopy-K-means algorithm as a sample set, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, taking a clustering division result corresponding to the optimal clustering number as a target clustering division result, further determining an optimal feature set of power consumption behaviors of the users, and generating a power consumption behavior portrait of the users according to the optimal feature set and the target clustering division result; according to the invention, by improving the clustering process of the sample set, the overall efficiency of the clustering algorithm is improved, the problem that the initial clustering center is difficult to determine is solved, and the clustering precision can be effectively improved.

Description

Power consumer behavior portrait method, system and device

Technical Field

The invention relates to the technical field of customer management in the power industry, in particular to a method, a system and a device for representing behaviors of power consumers.

Background

The user portrait is used as a data analysis and service design tool for quickly and accurately reproducing the overall appearance of the consumer, can reflect the characteristics of the consumer such as consumption behavior mode, consumption habit and the like, and provides a new idea for mining the demand and value of the consumer, promoting the accurate marketing of enterprises, implementing the market refinement of the enterprises and improving the user experience. In recent years, with the rapid development of big data technology, many power enterprises establish marketing systems related to big data based on user figures so as to carry out accurate marketing and information recommendation.

Clustering algorithms can form several data sets from mass data in an unsupervised manner, including partition-based clustering, hierarchy-based clustering, density-based clustering, fuzzy-based clustering, and gaussian mixture model clustering. Because each algorithm has a specific optimization criterion and is only suitable for a specific data structure and the shape of a cluster, the clustering efficiency, the clustering precision and the clustering robustness are difficult to be considered.

In the prior art, the load data of the power consumers are generally clustered based on clustering algorithms such as hierarchical clustering, density clustering and fuzzy C-means clustering, so as to be used for carrying out portrayal about the power consumption behaviors of the power consumers. The power load data often has the characteristics of high dimensional characteristics and large data volume, and although the clustering algorithm has better algorithm maturity, the clustering algorithm has the defects that the initial clustering center is difficult to determine, and the clustering accuracy and efficiency are common.

Disclosure of Invention

The invention provides a method, a system and a device for representing electric power user behaviors, and solves the technical problems that an initial clustering center is difficult to determine and the clustering precision and efficiency are general in the conventional clustering algorithm for representing electric power users.

The invention provides a power consumer behavior portrait method in a first aspect, which comprises the following steps:

acquiring power consumer load data, and correcting and normalizing the power consumer load data to form a sample set;

clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the inter-class separation degree relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class separation degree;

and determining an optimal feature set of the user power consumption behavior, and generating a user power consumption behavior portrait according to the optimal feature set and the target clustering partitioning result.

According to an implementation manner of the first aspect of the present invention, the clustering the sample set by using a Canopy-K-means algorithm includes:

pre-clustering the sample set through a Canopy algorithm to obtain a plurality of Canopy subsets and the mass center of each Canopy subset;

and taking the mass center of each Canopy subset as an initial clustering center, and clustering the sample set by adopting a K-means algorithm.

According to an implementation manner of the first aspect of the present invention, the pre-clustering the sample set by using a Canopy algorithm includes:

generating a sample list according to the sample set, and respectively determining an initial distance threshold value T according to 80% and 60% of the average value of the samples₁、T₂And T is₁>T₂；

Randomly selecting a sample point from the sample list as a first Canopy centroid, and generating a Canopy subset for the first Canopy centroid, denoted as S₀；

Randomly selecting one sample point from the rest sample points in the sample list, marking the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T₁Then, consider Q as a weakly labeled sample point and put S₀If D is less than or equal to T₂Then, consider Q as a strongly labeled sample point and put into S₀If D is>T₁Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are the corresponding centroids;

and repeating the third step until the number of elements in the sample list is zero, and outputting the obtained Canopy subset and the centroid thereof.

According to an implementation manner of the first aspect of the present invention, the calculating the cluster validity indicator of each clustering scheme includes:

calculating a first clustering validity index according to the following formula:

in the formula, T_QDIs a first clustering validity index, T_QD(i) The distance from the ith intra-class data object in the cluster to the cluster center is shown, and N is the number of the intra-class data objects in the cluster;

the second type of effectiveness index is calculated according to the following formula:

in the formula, T_PDFor a second class of validity indicators, Q_ijIs Q_iAnd Q_jDistance between cluster centers of (2), Q_iFor the i-th class object set, Q_jFor a set of j-th class objects, D_iIs Q_iAverage distance of data object to its cluster center, D_jIs Q_jThe average distance from the middle data object to the clustering center of the middle data object, wherein K is the clustering number;

the third cluster effectiveness index is calculated according to the following formula:

wherein

In the formula, T_YDAs a third measure of effectiveness, o_i、o_jThe cluster centers of the ith class and the jth class are respectively, n is the number of samples of the sample set, x_jIs sample data, n_jNumber of samples, δ, for set of class j objects_ijIs a Boolean value.

According to an implementation manner of the first aspect of the present invention, the determining an optimal feature set of the user power consumption behavior includes:

constructing a user electricity consumption behavior feature set, wherein the user electricity consumption behavior feature set comprises electricity consumption scale, electricity consumption category, electricity consumption time-to-date difference, electricity consumption temperature difference, daily average load stability, daily average electricity utilization rate, electricity consumption fluctuation ring ratio trend, daily peak-to-valley difference and working characteristics;

and determining the optimal feature set of the user electricity utilization behavior from the user electricity utilization behavior feature set according to the maximum correlation minimum redundancy criterion.

According to a manner that can be realized by the first aspect of the present invention, the generating a user electricity consumption behavior portrait according to the optimal feature set and the target clustering partition result includes:

and analyzing the optimal feature set of different power consumption behaviors by adopting a grading method, and visually expressing the power consumption characteristics of various users through a radar map and/or visually expressing the comparison of the power consumption characteristics of the users among different classes through a histogram.

The invention provides a power consumer behavior representation system in a second aspect, comprising:

the system comprises a sample set forming module, a data processing module and a data processing module, wherein the sample set forming module is used for acquiring power consumer load data, and correcting and normalizing the power consumer load data to form a sample set;

the clustering module is used for clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the inter-class separation degree relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class separation degree;

and the portrait generation module is used for determining an optimal feature set of the power consumption behaviors of the user and generating the portrait of the power consumption behaviors of the user according to the optimal feature set and the target clustering division result.

According to a manner in which the second aspect of the present invention can be realized, the clustering module includes a clustering submodule for clustering the sample set by using a Canopy-K-means algorithm, and the clustering submodule includes:

the pre-clustering unit is used for pre-clustering the sample set through a Canopy algorithm to obtain a plurality of Canopy subsets and the mass center of each Canopy subset;

and the re-clustering unit is used for clustering the sample set by taking the mass center of each Canopy subset as an initial clustering center and adopting a K-means algorithm.

According to a manner that can be realized by the second aspect of the present invention, the pre-polymerization type unit is specifically configured to:

Randomly selecting one sample point from the rest sample points in the sample list, recording the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T₁If Q is a weakly marked sample point, then S is placed₀If D is less than or equal to T₂Then, consider Q as a strongly labeled sample point and put into S₀If D is>T₁Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are corresponding centroids;

According to a manner that can be realized by the second aspect of the present invention, the clustering module includes a calculation sub-module for calculating a clustering validity index of each clustering scheme, and the calculation sub-module includes:

a first calculating unit, configured to calculate a first cluster validity indicator according to the following formula:

a second calculating unit, configured to calculate a second aggregation effectiveness index according to the following formula:

in the formula, T_PDFor a second class of validity indicators, Q_ijIs Q_iAnd Q_jDistance between cluster centers of (2), Q_iFor the i-th class object set, Q_jFor the set of j-th class objects, D_iIs Q_iAverage distance of data object to its cluster center, D_jIs Q_jThe average distance from the middle data object to the clustering center of the middle data object, wherein K is the clustering number;

a third calculating unit, configured to calculate a third aggregation-type validity indicator according to the following formula:

wherein the content of the first and second substances,

According to an enabling manner of the second aspect of the invention, the representation generation module comprises a feature determination sub-module for determining an optimal set of features for the user's electricity usage behavior, the feature determination sub-module comprising:

the system comprises a construction unit, a power utilization unit and a power utilization management unit, wherein the construction unit is used for constructing a user power utilization behavior feature set, and the user power utilization behavior feature set comprises power utilization scale, power utilization category, power utilization time-interval difference, power utilization temperature difference, daily average load stability, daily average power utilization rate, power utilization fluctuation-to-fluctuation ring ratio trend, daily peak-valley difference and working characteristics;

and the characteristic screening unit is used for determining the optimal characteristic set of the user electricity utilization behavior from the user electricity utilization behavior characteristic set according to the maximum correlation minimum redundancy criterion.

According to an implementable manner of the second aspect of the present invention, the sketch generating module includes a generating sub-module for generating a sketch of the electricity consumption behavior of the user according to the optimal feature set and the target clustering division result, and the generating sub-module is specifically configured to:

and analyzing the optimal feature set of different power utilization behaviors by adopting a grading system, and visually expressing the power utilization characteristics of various users through a radar map and/or visually expressing the comparison of the power utilization characteristics of the users among different classes through a histogram.

The third aspect of the present invention provides a power consumer behavior representation apparatus, comprising:

a memory to store instructions; wherein the instructions are for implementing a power consumer behavior representation method as described in any one of the above implementable manners;

a processor to execute the instructions in the memory.

A fourth aspect of the present invention is a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements a power consumer behavior representation method as described in any one of the above-mentioned implementable manners.

According to the technical scheme, the invention has the following advantages:

the method comprises the steps of correcting and normalizing load data of power consumers, clustering the processed data by using a Canopy-K-means algorithm as a sample set, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, taking a clustering division result corresponding to the optimal clustering number as a target clustering division result, further determining an optimal feature set of power consumption behaviors of the users, and generating a power consumption behavior portrait of the users according to the optimal feature set and the target clustering division result; the method and the device adopt the Canopy-K-means algorithm to cluster the sample set, can improve the overall efficiency of the clustering algorithm, solve the problem that the initial clustering center is difficult to determine, determine the optimal clustering number through the value of each clustering effectiveness index, and can effectively improve the clustering precision.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flow chart of a power consumer behavior representation method according to an alternative embodiment of the present invention;

FIG. 2 is a schematic block diagram of a power consumer behavior representation system according to an alternative embodiment of the present invention.

Reference numerals:

1-a sample set forming module; 2-a clustering module; 3-portrait creation module.

Detailed Description

The embodiment of the invention provides a method, a system and a device for representing power consumer behaviors, which are used for solving the technical problems that the conventional clustering algorithm for representing power consumers is difficult to determine an initial clustering center and the clustering precision and efficiency are general.

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a power consumer behavior portrait method.

Referring to fig. 1, fig. 1 is a flowchart illustrating a power consumer behavior representation method according to an embodiment of the present invention.

The embodiment of the invention provides a power consumer behavior portrait method, which comprises the following steps:

step S1, acquiring the power consumer load data, and performing correction and normalization processing on the power consumer load data to form a sample set.

When the load data of the power consumer is acquired, the data acquisition can be carried out through a data acquisition device installed at the user. The collected power consumer load data often has a partial vacancy value, a negative value and a zero value. Because many clustering algorithms are sensitive to abnormal values in the original data, the accuracy of clustering results is affected by abnormal data in the load data, so that the clustering effect is poor, and even wrong classification is generated. By searching and correcting abnormal data in the original data, the corrected data can approach or even restore the original data, and the method is an essential important link in clustering.

When the power consumer load data is corrected in order to avoid the influence of the abnormal values and the missing values of the data on the clustering effect, field load prediction personnel can correct the data according to long-term accumulated experience, and the data can also be corrected by a data transverse and longitudinal comparison method.

The data is corrected by a data transverse and longitudinal comparison method, and the method specifically comprises the following steps:

and comparing the load at a certain moment with the loads at the moments before and after the certain moment, or respectively comparing the load value at the certain moment with the load values at the same moment in the last two days, and if the deviation is greater than a certain closed value, taking an average value to replace the deviation.

According to the embodiment of the invention, the effectiveness of the clustering result can be ensured by carrying out normalization processing on the corrected data, and the calculation complexity of the algorithm is reduced, so that the best effect of the clustering algorithm is exerted.

As an embodiment, the normalizing process is performed on the corrected data, and includes:

setting the corrected load data sequence of the power consumer as X_i＝(x_i,1,x_i,2,…,x_i,ρ) And processing the data by adopting the following normalization processing formula:

in the formula, x_i,jIs a sequence X_iLoad value of jth sample of (1), x_i,jIs' as to x_i,jNormalized value, x_i,min、x_i,maxAre respectively sequence X_iThe load minimum and load maximum in (1), where ρ is the sequence X_iThe amount of data of (c).

And S2, clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the intra-class compactness relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class compactness.

In one implementation, the clustering the sample set using the Canopy-K-means algorithm includes:

The Canopy algorithm judges object similarity according to a simple method with small calculation amount, and is commonly used for initial clustering of massive high-dimensional data. The Canopy algorithm is different from other clustering algorithms in that overlapping is allowed between the Canopy subsets obtained by clustering, that is, one data object may belong to two Canopy subsets, and the clustering precision is general, so that the clustering result is usually not directly used as the final clustering result, but used as preprocessing, and then other accurate clustering is performed. There are no outliers in a Canopy cluster, i.e., each data object must belong to a Canopy subset, or a data object may belong to a Canopy subset individually. The characteristics of the Canopy algorithm determine that the data processing speed of the algorithm is high, the data object can be divided into a plurality of Canopy subsets quickly and efficiently, and the centroid of each subset, namely the clustering center, is determined.

In one implementation, the pre-clustering the sample set by using a Canopy algorithm includes:

Randomly selecting one sample point from the rest sample points in the sample list, recording the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T₁If Q is a weakly marked sample point, then S is placed₀If D is less than or equal to T₂Then, consider Q as a strongly labeled sample point and put into S₀If D is>T₁Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are the corresponding centroids;

The K-means algorithm is a classical traditional clustering algorithm, a plurality of sample objects are divided into a plurality of classes by the K-means clustering algorithm through distance calculation, the algorithm is simple in calculation, high in efficiency and simple in principle, however, the algorithm needs to artificially preset clustering numbers, an initial clustering center corresponding to the clustering numbers is randomly determined, the distance from the plurality of sample objects to each clustering center is calculated according to a distance formula, the closest class is selected to be added, the clustering centers are recalculated after the completion of the calculation until the change is avoided or the iteration frequency is completed, a clustering result is finally obtained, and the clustering result usually uses the mean square error as a judgment index.

According to the embodiment of the invention, the Canopy algorithm is used for pre-clustering the data, and the K-means clustering is carried out on the pre-clustering result, so that the overall calculation efficiency of the algorithm can be improved. And (3) taking the Canopy subset obtained by pre-clustering the Canopy algorithm as an initial clustering center of the K-means algorithm, and determining the clustering number at the same time, thereby solving the problems of uncertain initial clustering center and clustering number of the K-means algorithm.

In one implementation, the calculating the cluster validity indicator of each clustering scheme includes:

calculating a first cluster validity index according to the following formula:

the second type of effectiveness indicator is calculated according to the following formula:

in the formula, T_PDFor the second polymer class effectiveness index, Q_ijIs Q_iAnd Q_jDistance between cluster centers of (2), Q_iFor the i-th class object set, Q_jFor a set of j-th class objects, D_iIs Q_iAverage distance of data object to its cluster center, D_jIs Q_jThe average distance from the middle data object to the clustering center of the middle data object, wherein K is the clustering number;

wherein the content of the first and second substances,

When the optimal clustering number is determined according to the value of the clustering effectiveness index of each clustering scheme, the embodiment of the invention determines the optimal clustering number by combining the first clustering effectiveness index, the second clustering effectiveness index and the third clustering effectiveness index.

Wherein the first cluster validity indicator is a measure of the distance of all data objects within a class in the cluster to the cluster center. When the clustering number is constant, the smaller the value is, the smaller the distance from each data object in the class to the clustering center is, the more concentrated the data object of each class is, the better the clustering effect is;

the larger the value of the second clustering validity index is, the better the clustering result of the clustering algorithm is considered to be;

the smaller the value of the third clustering validity index is, the better the clustering result of the clustering algorithm is.

And step S3, determining an optimal feature set of the user electricity consumption behavior, and generating a user electricity consumption behavior portrait according to the optimal feature set and the target clustering division result.

In one implementation, the determining the optimal feature set of the user power consumption behavior includes:

constructing a user electricity consumption behavior feature set, wherein the user electricity consumption behavior feature set comprises electricity consumption scale, electricity consumption category, electricity consumption time-interval difference, electricity consumption temperature difference, daily average load stability, daily average electricity utilization rate, electricity consumption fluctuation ring ratio trend, daily peak-valley difference and working features;

In the analysis of the power consumption behavior, the power consumption characteristics derived from the power consumption curve are generally adopted to characterize the power consumption behavior of the user. The user power utilization behavior feature set aims to rapidly master power utilization features of different customer groups, so that differentiated services of the different power utilization groups are achieved. Therefore, in selecting the user electricity consumption behavior feature set, an index that most reflects the customer electricity consumption feature needs to be considered.

In the embodiment of the invention, a user electricity consumption behavior characteristic set is constructed on the basis of electricity consumption scale, electricity consumption category, electricity consumption time-interval difference, electricity consumption temperature difference, daily average load stability, daily average electricity utilization rate, electricity consumption fluctuation-to-fluctuation ring ratio trend, daily peak-valley difference and working characteristics.

The description and attributes of each electricity usage characteristic index are shown in table 1.

Table 1:

the maximum correlation minimum redundancy criterion is a filtered feature selection method. The core idea is to maximize the correlation between the features and the classification variables and minimize the redundancy between the features. The embodiment is applied to the selection of the user electricity utilization characteristics to obtain the characteristic set with the strongest correlation and the lowest redundancy for representing the user electricity utilization characteristics.

The correlation between the characteristic and the classification variable takes a mutual information value between the characteristic and the classification variable as a measurement index, and the measurement index represents the degree of uncertainty reduction of the category when the characteristic is known. In the solving process, in order to make each characteristic variable have statistical significance, variable domain discretization processing needs to be carried out on each variable, namely, the numerical sequence of each variable is converted into a probability distribution interval.

According to the embodiment of the invention, the characteristics are normalized, then the variable intervals are uniformly dispersed to obtain the probability distribution of each characteristic variable, and then the mutual information calculation of each characteristic quantity and the user category is completed.

Specifically, the maximum correlation index D (Y, e) of the feature set and the category e is set as:

in the formula, N_YThe number of features included in the feature set Y, d_iIs the ith feature in the feature set Y, U (d)_i(ii) a e) Is d_iAnd a mutual information value between user categories e.

The redundancy of information between two features can be measured by indexes such as information gain, a kini coefficient, a correlation coefficient and the like. As an embodiment, the redundancy of information between two features is measured by using a correlation coefficient:

in the formula (I), the compound is shown in the specification,

is characterized by_iAnd feature d_jThe correlation coefficient is in the range of [ -1,1]The closer the absolute value is to 1, the greater the correlation is, and the closer to 0, the smaller the correlation is; cov (d)_i,d_j) Is characterized by d_iAnd feature d_jThe covariance of (a) is determined,

is characterized by_iThe standard deviation of the (c) is,

is characterized by_jStandard deviation of (2).

Setting a minimum redundancy index S (Y) as follows:

the maximum correlation minimum redundancy criterion is obtained by combining the two indexes, and the corresponding formula is as follows:

in the formula I_mRMRRepresenting the maximum correlation minimum redundancy criterion.

And solving the feature set Y meeting the maximum correlation minimum redundancy criterion to obtain the optimal feature set.

The solution of the optimal feature set can be converted into an optimization problem, and a global optimal solution is obtained by adopting a traversal method as an implementation mode in consideration of small initial feature quantity of the power utilization behavior of the user. Let f_iFor the set membership indication function, it is 0-1 coded, f _i1 indicates that the feature is present in Y, f_i0 indicates that the feature d is not present in Y_i. To simplify the formulation, the mutual information U (d) is_i(ii) a e) And correlation coefficient

Respectively using u_iAnd v_ijIs shown as I_mRMRThe expression of (a) is:

traversing f ═ 0,0, …,0 to f ═ 1,1, …,1) yields let I_mRMRAnd decoding the largest f vector to obtain an optimal feature set.

In an implementation manner, the generating a user electricity consumption behavior portrait according to the optimal feature set and the target clustering and partitioning result includes:

According to the embodiment of the invention, the radar map is used for visually expressing the electricity utilization characteristics of various users, and the histogram is used for visually expressing the comparison of the electricity utilization characteristics of the users among different classes, so that business personnel can more accurately and conveniently know the commonalities and individualities of the electricity utilization behaviors of the power users.

Most of the user electricity consumption behavior data are numerical data, and the data can be converted into a label convenient for business personnel to understand through a certain conversion rule. In this embodiment, a scoring system is adopted, the full score is 10, and the power consumption characteristics of each type of user are measured by the score of each label of each type of user. The score per label per user type is given by:

in the formula, T_i,jA score for a jth feature for an ith class of users;

is the average of the jth characteristics of all users belonging to the ith class; t is t_jmax、t_jminRespectively, the maximum value and the minimum value of the jth characteristic.

The invention also provides a power consumer behavior representation system.

Referring to fig. 2, fig. 2 is a schematic block diagram of a power consumer behavior representation system according to an embodiment of the present invention.

The embodiment of the invention provides a power consumer behavior representation system, which comprises:

the system comprises a sample set forming module 1, a data processing module and a data processing module, wherein the sample set forming module is used for acquiring power consumer load data, and correcting and normalizing the power consumer load data to form a sample set;

the clustering module 2 is used for clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the inter-class separation degree relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class separation degree;

and the portrait generation module 3 is used for determining an optimal feature set of the user electricity consumption behavior and generating a portrait of the user electricity consumption behavior according to the optimal feature set and the target clustering division result.

In one implementation, the clustering module 2 includes a clustering submodule for clustering the sample set by using a Canopy-K-means algorithm, and the clustering submodule includes:

In a practical manner, the prepolymerization unit is specifically used for:

Randomly selecting a sample point from the sample list as a first Canopy centroid and for the first Canopyy centroids generate a Canopy subset, denoted S₀；

Randomly selecting one sample point from the rest sample points in the sample list, recording the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T₁Then, consider Q as a weakly labeled sample point and put S₀If D is less than or equal to T₂Then, consider Q as a strongly labeled sample point and put into S₀If D is>T₁Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are corresponding centroids;

In an implementation manner, the clustering module 2 includes a calculation sub-module for calculating a cluster validity indicator of each clustering scheme, and the calculation sub-module includes:

a second calculating unit, configured to calculate a second clustering validity indicator according to the following formula:

in the formula, T_PDFor a second class of validity indicators, Q_ijIs Q_iAnd Q_jDistance between cluster centers of (2), Q_iFor the i-th class object set, Q_jFor a set of j-th class objects, D_iIs Q_iAverage distance of data object to its cluster center, D_jIs Q_jThe average distance from the data object to the clustering center of the data object, and K is the clustering number;

wherein the content of the first and second substances,

In an implementable manner, the representation generation module 3 comprises a feature determination sub-module for determining an optimal set of features for the user's electricity usage behaviour, the feature determination sub-module comprising:

In an implementation manner, the sketch generation module 3 includes a generation sub-module configured to generate a sketch of the electrical behavior of the user according to the optimal feature set and the target cluster partitioning result, where the generation sub-module is specifically configured to:

and analyzing the optimal feature set of different power utilization behaviors by adopting a scoring system, and visually expressing the power utilization characteristics of various users through a radar map and/or visually expressing the comparison of the power utilization characteristics of the users among different classes through a histogram.

The invention also provides a power consumer behavior portrait device, which comprises:

a memory to store instructions; wherein the instructions are used for implementing the power consumer behavior representation method according to any one of the above embodiments;

a processor to execute the instructions in the memory.

The invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to implement the power user behavior representation method according to any one of the above embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, and the specific beneficial effects of the systems, apparatuses and modules described above may refer to the corresponding beneficial effects in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A power consumer behavior portrait method is characterized by comprising the following steps:

and determining an optimal feature set of the power utilization behavior of the user, and generating a user power utilization behavior portrait according to the optimal feature set and the target clustering division result.

2. The power consumer behavior representation method according to claim 1, wherein the clustering the sample set by using a Canopy-K-means algorithm comprises:

3. The power consumer behavior representation method according to claim 2, wherein the pre-clustering the sample set by using a Canopy algorithm comprises:

Randomly selecting one sample point from the rest sample points in the sample list, marking the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T₁Then, consider Q as a weakly labeled sample point and put S₀If D is less than or equal to T₂Then, consider Q as a strongly labeled sample point and put into S₀If D is>T₁Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are corresponding centroids;

4. The power consumer behavior representation method according to claim 1, wherein the calculating a cluster effectiveness index for each clustering scheme includes:

calculating a first cluster validity index according to the following formula:

in the formula, T_PDFor a second class of validity indicators, Q_ijIs Q_iAnd Q_jDistance between cluster centers of (2), Q_iFor the i-th class object set, Q_jFor a set of j-th class objects, D_iIs Q_iAverage distance of data object to its cluster center, D_jIs Q_jClustering center of data object toK is the number of clusters;

wherein

5. The method for representing electric power consumer behavior as claimed in claim 1, wherein the step of determining the optimal feature set of the consumer behavior comprises:

6. The electric power consumer behavior representation method according to claim 5, wherein the generating of the user electricity consumption behavior representation according to the optimal feature set and the target clustering and partitioning result comprises:

7. A power consumer behavior representation system, comprising:

8. The power consumer behavioral representation system according to claim 7, wherein the clustering module includes a clustering submodule for clustering the sample set using a Canopy-K-means algorithm, the clustering submodule including:

9. The power consumer behavior representation system of claim 8, wherein the pre-clustering unit is specifically configured to:

Randomly selecting one sample point from the rest sample points in the sample list, recording the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T₁Then, consider Q as a weakly labeled sample point and put S₀If D is less than or equal to T₂Then, consider Q as a strongly labeled sample point and put into S₀If D is>T₁Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are the corresponding centroids;

10. The power consumer behavior representation system of claim 7, wherein the clustering module comprises a computation submodule for computing a cluster validity indicator for each clustering scheme, the computation submodule comprising:

wherein

11. The power consumer behavior representation system of claim 7, wherein the representation generation module comprises a feature determination sub-module for determining an optimal set of features for consumer electricity usage behavior, the feature determination sub-module comprising:

12. The power consumer behavior representation system of claim 11, wherein the representation generation module comprises a generation submodule configured to generate a representation of the consumer electrical behavior according to the optimal feature set and the target cluster partitioning result, and the generation submodule is specifically configured to:

13. A power consumer behavior representation device, comprising:

a memory to store instructions; wherein the instructions are used for realizing the power consumer behavior representation method according to any one of claims 1-6;

a processor to execute the instructions in the memory.

14. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the power user behavior representation method according to any one of claims 1 to 6.