CN114611976A - Power consumer behavior portrait method, system and device - Google Patents

Power consumer behavior portrait method, system and device Download PDF

Info

Publication number
CN114611976A
CN114611976A CN202210288846.4A CN202210288846A CN114611976A CN 114611976 A CN114611976 A CN 114611976A CN 202210288846 A CN202210288846 A CN 202210288846A CN 114611976 A CN114611976 A CN 114611976A
Authority
CN
China
Prior art keywords
clustering
canopy
sample
class
power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210288846.4A
Other languages
Chinese (zh)
Inventor
林文浩
姜绍艳
简玮侠
谢东霖
张永亮
熊力
陈昱
夏曼
梁丽丽
产启中
林尔迅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Zhongshan Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Zhongshan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Zhongshan Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN202210288846.4A priority Critical patent/CN114611976A/en
Publication of CN114611976A publication Critical patent/CN114611976A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of customer management in the power industry and discloses a method, a system and a device for representing power consumer behaviors. The method comprises the steps of correcting and normalizing load data of power consumers, clustering the processed data by using a Canopy-K-means algorithm as a sample set, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, taking a clustering division result corresponding to the optimal clustering number as a target clustering division result, further determining an optimal feature set of power consumption behaviors of the users, and generating a power consumption behavior portrait of the users according to the optimal feature set and the target clustering division result; according to the invention, by improving the clustering process of the sample set, the overall efficiency of the clustering algorithm is improved, the problem that the initial clustering center is difficult to determine is solved, and the clustering precision can be effectively improved.

Description

Power consumer behavior portrait method, system and device
Technical Field
The invention relates to the technical field of customer management in the power industry, in particular to a method, a system and a device for representing behaviors of power consumers.
Background
The user portrait is used as a data analysis and service design tool for quickly and accurately reproducing the overall appearance of the consumer, can reflect the characteristics of the consumer such as consumption behavior mode, consumption habit and the like, and provides a new idea for mining the demand and value of the consumer, promoting the accurate marketing of enterprises, implementing the market refinement of the enterprises and improving the user experience. In recent years, with the rapid development of big data technology, many power enterprises establish marketing systems related to big data based on user figures so as to carry out accurate marketing and information recommendation.
Clustering algorithms can form several data sets from mass data in an unsupervised manner, including partition-based clustering, hierarchy-based clustering, density-based clustering, fuzzy-based clustering, and gaussian mixture model clustering. Because each algorithm has a specific optimization criterion and is only suitable for a specific data structure and the shape of a cluster, the clustering efficiency, the clustering precision and the clustering robustness are difficult to be considered.
In the prior art, the load data of the power consumers are generally clustered based on clustering algorithms such as hierarchical clustering, density clustering and fuzzy C-means clustering, so as to be used for carrying out portrayal about the power consumption behaviors of the power consumers. The power load data often has the characteristics of high dimensional characteristics and large data volume, and although the clustering algorithm has better algorithm maturity, the clustering algorithm has the defects that the initial clustering center is difficult to determine, and the clustering accuracy and efficiency are common.
Disclosure of Invention
The invention provides a method, a system and a device for representing electric power user behaviors, and solves the technical problems that an initial clustering center is difficult to determine and the clustering precision and efficiency are general in the conventional clustering algorithm for representing electric power users.
The invention provides a power consumer behavior portrait method in a first aspect, which comprises the following steps:
acquiring power consumer load data, and correcting and normalizing the power consumer load data to form a sample set;
clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the inter-class separation degree relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class separation degree;
and determining an optimal feature set of the user power consumption behavior, and generating a user power consumption behavior portrait according to the optimal feature set and the target clustering partitioning result.
According to an implementation manner of the first aspect of the present invention, the clustering the sample set by using a Canopy-K-means algorithm includes:
pre-clustering the sample set through a Canopy algorithm to obtain a plurality of Canopy subsets and the mass center of each Canopy subset;
and taking the mass center of each Canopy subset as an initial clustering center, and clustering the sample set by adopting a K-means algorithm.
According to an implementation manner of the first aspect of the present invention, the pre-clustering the sample set by using a Canopy algorithm includes:
generating a sample list according to the sample set, and respectively determining an initial distance threshold value T according to 80% and 60% of the average value of the samples1、T2And T is1>T2
Randomly selecting a sample point from the sample list as a first Canopy centroid, and generating a Canopy subset for the first Canopy centroid, denoted as S0
Randomly selecting one sample point from the rest sample points in the sample list, marking the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T1Then, consider Q as a weakly labeled sample point and put S0If D is less than or equal to T2Then, consider Q as a strongly labeled sample point and put into S0If D is>T1Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are the corresponding centroids;
and repeating the third step until the number of elements in the sample list is zero, and outputting the obtained Canopy subset and the centroid thereof.
According to an implementation manner of the first aspect of the present invention, the calculating the cluster validity indicator of each clustering scheme includes:
calculating a first clustering validity index according to the following formula:
Figure BDA0003560908820000031
in the formula, TQDIs a first clustering validity index, TQD(i) The distance from the ith intra-class data object in the cluster to the cluster center is shown, and N is the number of the intra-class data objects in the cluster;
the second type of effectiveness index is calculated according to the following formula:
Figure BDA0003560908820000032
in the formula, TPDFor a second class of validity indicators, QijIs QiAnd QjDistance between cluster centers of (2), QiFor the i-th class object set, QjFor a set of j-th class objects, DiIs QiAverage distance of data object to its cluster center, DjIs QjThe average distance from the middle data object to the clustering center of the middle data object, wherein K is the clustering number;
the third cluster effectiveness index is calculated according to the following formula:
Figure BDA0003560908820000033
wherein
Figure BDA0003560908820000034
In the formula, TYDAs a third measure of effectiveness, oi、ojThe cluster centers of the ith class and the jth class are respectively, n is the number of samples of the sample set, xjIs sample data, njNumber of samples, δ, for set of class j objectsijIs a Boolean value.
According to an implementation manner of the first aspect of the present invention, the determining an optimal feature set of the user power consumption behavior includes:
constructing a user electricity consumption behavior feature set, wherein the user electricity consumption behavior feature set comprises electricity consumption scale, electricity consumption category, electricity consumption time-to-date difference, electricity consumption temperature difference, daily average load stability, daily average electricity utilization rate, electricity consumption fluctuation ring ratio trend, daily peak-to-valley difference and working characteristics;
and determining the optimal feature set of the user electricity utilization behavior from the user electricity utilization behavior feature set according to the maximum correlation minimum redundancy criterion.
According to a manner that can be realized by the first aspect of the present invention, the generating a user electricity consumption behavior portrait according to the optimal feature set and the target clustering partition result includes:
and analyzing the optimal feature set of different power consumption behaviors by adopting a grading method, and visually expressing the power consumption characteristics of various users through a radar map and/or visually expressing the comparison of the power consumption characteristics of the users among different classes through a histogram.
The invention provides a power consumer behavior representation system in a second aspect, comprising:
the system comprises a sample set forming module, a data processing module and a data processing module, wherein the sample set forming module is used for acquiring power consumer load data, and correcting and normalizing the power consumer load data to form a sample set;
the clustering module is used for clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the inter-class separation degree relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class separation degree;
and the portrait generation module is used for determining an optimal feature set of the power consumption behaviors of the user and generating the portrait of the power consumption behaviors of the user according to the optimal feature set and the target clustering division result.
According to a manner in which the second aspect of the present invention can be realized, the clustering module includes a clustering submodule for clustering the sample set by using a Canopy-K-means algorithm, and the clustering submodule includes:
the pre-clustering unit is used for pre-clustering the sample set through a Canopy algorithm to obtain a plurality of Canopy subsets and the mass center of each Canopy subset;
and the re-clustering unit is used for clustering the sample set by taking the mass center of each Canopy subset as an initial clustering center and adopting a K-means algorithm.
According to a manner that can be realized by the second aspect of the present invention, the pre-polymerization type unit is specifically configured to:
generating a sample list according to the sample set, and respectively determining an initial distance threshold value T according to 80% and 60% of the average value of the samples1、T2And T is1>T2
Randomly selecting a sample point from the sample list as a first Canopy centroid, and generating a Canopy subset for the first Canopy centroid, denoted as S0
Randomly selecting one sample point from the rest sample points in the sample list, recording the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T1If Q is a weakly marked sample point, then S is placed0If D is less than or equal to T2Then, consider Q as a strongly labeled sample point and put into S0If D is>T1Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are corresponding centroids;
and repeating the third step until the number of elements in the sample list is zero, and outputting the obtained Canopy subset and the centroid thereof.
According to a manner that can be realized by the second aspect of the present invention, the clustering module includes a calculation sub-module for calculating a clustering validity index of each clustering scheme, and the calculation sub-module includes:
a first calculating unit, configured to calculate a first cluster validity indicator according to the following formula:
Figure BDA0003560908820000051
in the formula, TQDIs a first clustering validity index, TQD(i) The distance from the ith intra-class data object in the cluster to the cluster center is shown, and N is the number of the intra-class data objects in the cluster;
a second calculating unit, configured to calculate a second aggregation effectiveness index according to the following formula:
Figure BDA0003560908820000052
in the formula, TPDFor a second class of validity indicators, QijIs QiAnd QjDistance between cluster centers of (2), QiFor the i-th class object set, QjFor the set of j-th class objects, DiIs QiAverage distance of data object to its cluster center, DjIs QjThe average distance from the middle data object to the clustering center of the middle data object, wherein K is the clustering number;
a third calculating unit, configured to calculate a third aggregation-type validity indicator according to the following formula:
Figure BDA0003560908820000053
wherein the content of the first and second substances,
Figure BDA0003560908820000054
in the formula, TYDAs a third measure of effectiveness, oi、ojThe cluster centers of the ith class and the jth class are respectively, n is the number of samples of the sample set, xjIs sample data, njNumber of samples, δ, for set of class j objectsijIs a Boolean value.
According to an enabling manner of the second aspect of the invention, the representation generation module comprises a feature determination sub-module for determining an optimal set of features for the user's electricity usage behavior, the feature determination sub-module comprising:
the system comprises a construction unit, a power utilization unit and a power utilization management unit, wherein the construction unit is used for constructing a user power utilization behavior feature set, and the user power utilization behavior feature set comprises power utilization scale, power utilization category, power utilization time-interval difference, power utilization temperature difference, daily average load stability, daily average power utilization rate, power utilization fluctuation-to-fluctuation ring ratio trend, daily peak-valley difference and working characteristics;
and the characteristic screening unit is used for determining the optimal characteristic set of the user electricity utilization behavior from the user electricity utilization behavior characteristic set according to the maximum correlation minimum redundancy criterion.
According to an implementable manner of the second aspect of the present invention, the sketch generating module includes a generating sub-module for generating a sketch of the electricity consumption behavior of the user according to the optimal feature set and the target clustering division result, and the generating sub-module is specifically configured to:
and analyzing the optimal feature set of different power utilization behaviors by adopting a grading system, and visually expressing the power utilization characteristics of various users through a radar map and/or visually expressing the comparison of the power utilization characteristics of the users among different classes through a histogram.
The third aspect of the present invention provides a power consumer behavior representation apparatus, comprising:
a memory to store instructions; wherein the instructions are for implementing a power consumer behavior representation method as described in any one of the above implementable manners;
a processor to execute the instructions in the memory.
A fourth aspect of the present invention is a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements a power consumer behavior representation method as described in any one of the above-mentioned implementable manners.
According to the technical scheme, the invention has the following advantages:
the method comprises the steps of correcting and normalizing load data of power consumers, clustering the processed data by using a Canopy-K-means algorithm as a sample set, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, taking a clustering division result corresponding to the optimal clustering number as a target clustering division result, further determining an optimal feature set of power consumption behaviors of the users, and generating a power consumption behavior portrait of the users according to the optimal feature set and the target clustering division result; the method and the device adopt the Canopy-K-means algorithm to cluster the sample set, can improve the overall efficiency of the clustering algorithm, solve the problem that the initial clustering center is difficult to determine, determine the optimal clustering number through the value of each clustering effectiveness index, and can effectively improve the clustering precision.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of a power consumer behavior representation method according to an alternative embodiment of the present invention;
FIG. 2 is a schematic block diagram of a power consumer behavior representation system according to an alternative embodiment of the present invention.
Reference numerals:
1-a sample set forming module; 2-a clustering module; 3-portrait creation module.
Detailed Description
The embodiment of the invention provides a method, a system and a device for representing power consumer behaviors, which are used for solving the technical problems that the conventional clustering algorithm for representing power consumers is difficult to determine an initial clustering center and the clustering precision and efficiency are general.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a power consumer behavior portrait method.
Referring to fig. 1, fig. 1 is a flowchart illustrating a power consumer behavior representation method according to an embodiment of the present invention.
The embodiment of the invention provides a power consumer behavior portrait method, which comprises the following steps:
step S1, acquiring the power consumer load data, and performing correction and normalization processing on the power consumer load data to form a sample set.
When the load data of the power consumer is acquired, the data acquisition can be carried out through a data acquisition device installed at the user. The collected power consumer load data often has a partial vacancy value, a negative value and a zero value. Because many clustering algorithms are sensitive to abnormal values in the original data, the accuracy of clustering results is affected by abnormal data in the load data, so that the clustering effect is poor, and even wrong classification is generated. By searching and correcting abnormal data in the original data, the corrected data can approach or even restore the original data, and the method is an essential important link in clustering.
When the power consumer load data is corrected in order to avoid the influence of the abnormal values and the missing values of the data on the clustering effect, field load prediction personnel can correct the data according to long-term accumulated experience, and the data can also be corrected by a data transverse and longitudinal comparison method.
The data is corrected by a data transverse and longitudinal comparison method, and the method specifically comprises the following steps:
and comparing the load at a certain moment with the loads at the moments before and after the certain moment, or respectively comparing the load value at the certain moment with the load values at the same moment in the last two days, and if the deviation is greater than a certain closed value, taking an average value to replace the deviation.
According to the embodiment of the invention, the effectiveness of the clustering result can be ensured by carrying out normalization processing on the corrected data, and the calculation complexity of the algorithm is reduced, so that the best effect of the clustering algorithm is exerted.
As an embodiment, the normalizing process is performed on the corrected data, and includes:
setting the corrected load data sequence of the power consumer as Xi=(xi,1,xi,2,…,xi,ρ) And processing the data by adopting the following normalization processing formula:
Figure BDA0003560908820000081
in the formula, xi,jIs a sequence XiLoad value of jth sample of (1), xi,jIs' as to xi,jNormalized value, xi,min、xi,maxAre respectively sequence XiThe load minimum and load maximum in (1), where ρ is the sequence XiThe amount of data of (c).
And S2, clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the intra-class compactness relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class compactness.
In one implementation, the clustering the sample set using the Canopy-K-means algorithm includes:
pre-clustering the sample set through a Canopy algorithm to obtain a plurality of Canopy subsets and the mass center of each Canopy subset;
and taking the mass center of each Canopy subset as an initial clustering center, and clustering the sample set by adopting a K-means algorithm.
The Canopy algorithm judges object similarity according to a simple method with small calculation amount, and is commonly used for initial clustering of massive high-dimensional data. The Canopy algorithm is different from other clustering algorithms in that overlapping is allowed between the Canopy subsets obtained by clustering, that is, one data object may belong to two Canopy subsets, and the clustering precision is general, so that the clustering result is usually not directly used as the final clustering result, but used as preprocessing, and then other accurate clustering is performed. There are no outliers in a Canopy cluster, i.e., each data object must belong to a Canopy subset, or a data object may belong to a Canopy subset individually. The characteristics of the Canopy algorithm determine that the data processing speed of the algorithm is high, the data object can be divided into a plurality of Canopy subsets quickly and efficiently, and the centroid of each subset, namely the clustering center, is determined.
In one implementation, the pre-clustering the sample set by using a Canopy algorithm includes:
generating a sample list according to the sample set, and respectively determining an initial distance threshold value T according to 80% and 60% of the average value of the samples1、T2And T is1>T2
Randomly selecting a sample point from the sample list as a first Canopy centroid, and generating a Canopy subset for the first Canopy centroid, denoted as S0
Randomly selecting one sample point from the rest sample points in the sample list, recording the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T1If Q is a weakly marked sample point, then S is placed0If D is less than or equal to T2Then, consider Q as a strongly labeled sample point and put into S0If D is>T1Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are the corresponding centroids;
and repeating the third step until the number of elements in the sample list is zero, and outputting the obtained Canopy subset and the centroid thereof.
The K-means algorithm is a classical traditional clustering algorithm, a plurality of sample objects are divided into a plurality of classes by the K-means clustering algorithm through distance calculation, the algorithm is simple in calculation, high in efficiency and simple in principle, however, the algorithm needs to artificially preset clustering numbers, an initial clustering center corresponding to the clustering numbers is randomly determined, the distance from the plurality of sample objects to each clustering center is calculated according to a distance formula, the closest class is selected to be added, the clustering centers are recalculated after the completion of the calculation until the change is avoided or the iteration frequency is completed, a clustering result is finally obtained, and the clustering result usually uses the mean square error as a judgment index.
According to the embodiment of the invention, the Canopy algorithm is used for pre-clustering the data, and the K-means clustering is carried out on the pre-clustering result, so that the overall calculation efficiency of the algorithm can be improved. And (3) taking the Canopy subset obtained by pre-clustering the Canopy algorithm as an initial clustering center of the K-means algorithm, and determining the clustering number at the same time, thereby solving the problems of uncertain initial clustering center and clustering number of the K-means algorithm.
In one implementation, the calculating the cluster validity indicator of each clustering scheme includes:
calculating a first cluster validity index according to the following formula:
Figure BDA0003560908820000101
in the formula, TQDIs a first clustering validity index, TQD(i) The distance from the ith intra-class data object in the cluster to the cluster center is shown, and N is the number of the intra-class data objects in the cluster;
the second type of effectiveness indicator is calculated according to the following formula:
Figure BDA0003560908820000102
in the formula, TPDFor the second polymer class effectiveness index, QijIs QiAnd QjDistance between cluster centers of (2), QiFor the i-th class object set, QjFor a set of j-th class objects, DiIs QiAverage distance of data object to its cluster center, DjIs QjThe average distance from the middle data object to the clustering center of the middle data object, wherein K is the clustering number;
the third cluster effectiveness index is calculated according to the following formula:
Figure BDA0003560908820000103
wherein the content of the first and second substances,
Figure BDA0003560908820000104
in the formula, TYDAs a third measure of effectiveness, oi、ojThe cluster centers of the ith class and the jth class are respectively, n is the number of samples of the sample set, xjIs sample data, njNumber of samples, δ, for set of class j objectsijIs a Boolean value.
When the optimal clustering number is determined according to the value of the clustering effectiveness index of each clustering scheme, the embodiment of the invention determines the optimal clustering number by combining the first clustering effectiveness index, the second clustering effectiveness index and the third clustering effectiveness index.
Wherein the first cluster validity indicator is a measure of the distance of all data objects within a class in the cluster to the cluster center. When the clustering number is constant, the smaller the value is, the smaller the distance from each data object in the class to the clustering center is, the more concentrated the data object of each class is, the better the clustering effect is;
the larger the value of the second clustering validity index is, the better the clustering result of the clustering algorithm is considered to be;
the smaller the value of the third clustering validity index is, the better the clustering result of the clustering algorithm is.
And step S3, determining an optimal feature set of the user electricity consumption behavior, and generating a user electricity consumption behavior portrait according to the optimal feature set and the target clustering division result.
In one implementation, the determining the optimal feature set of the user power consumption behavior includes:
constructing a user electricity consumption behavior feature set, wherein the user electricity consumption behavior feature set comprises electricity consumption scale, electricity consumption category, electricity consumption time-interval difference, electricity consumption temperature difference, daily average load stability, daily average electricity utilization rate, electricity consumption fluctuation ring ratio trend, daily peak-valley difference and working features;
and determining the optimal feature set of the user electricity utilization behavior from the user electricity utilization behavior feature set according to the maximum correlation minimum redundancy criterion.
In the analysis of the power consumption behavior, the power consumption characteristics derived from the power consumption curve are generally adopted to characterize the power consumption behavior of the user. The user power utilization behavior feature set aims to rapidly master power utilization features of different customer groups, so that differentiated services of the different power utilization groups are achieved. Therefore, in selecting the user electricity consumption behavior feature set, an index that most reflects the customer electricity consumption feature needs to be considered.
In the embodiment of the invention, a user electricity consumption behavior characteristic set is constructed on the basis of electricity consumption scale, electricity consumption category, electricity consumption time-interval difference, electricity consumption temperature difference, daily average load stability, daily average electricity utilization rate, electricity consumption fluctuation-to-fluctuation ring ratio trend, daily peak-valley difference and working characteristics.
The description and attributes of each electricity usage characteristic index are shown in table 1.
Table 1:
Figure BDA0003560908820000111
Figure BDA0003560908820000121
the maximum correlation minimum redundancy criterion is a filtered feature selection method. The core idea is to maximize the correlation between the features and the classification variables and minimize the redundancy between the features. The embodiment is applied to the selection of the user electricity utilization characteristics to obtain the characteristic set with the strongest correlation and the lowest redundancy for representing the user electricity utilization characteristics.
The correlation between the characteristic and the classification variable takes a mutual information value between the characteristic and the classification variable as a measurement index, and the measurement index represents the degree of uncertainty reduction of the category when the characteristic is known. In the solving process, in order to make each characteristic variable have statistical significance, variable domain discretization processing needs to be carried out on each variable, namely, the numerical sequence of each variable is converted into a probability distribution interval.
According to the embodiment of the invention, the characteristics are normalized, then the variable intervals are uniformly dispersed to obtain the probability distribution of each characteristic variable, and then the mutual information calculation of each characteristic quantity and the user category is completed.
Specifically, the maximum correlation index D (Y, e) of the feature set and the category e is set as:
Figure BDA0003560908820000131
in the formula, NYThe number of features included in the feature set Y, diIs the ith feature in the feature set Y, U (d)i(ii) a e) Is diAnd a mutual information value between user categories e.
The redundancy of information between two features can be measured by indexes such as information gain, a kini coefficient, a correlation coefficient and the like. As an embodiment, the redundancy of information between two features is measured by using a correlation coefficient:
Figure BDA0003560908820000132
in the formula (I), the compound is shown in the specification,
Figure BDA0003560908820000133
is characterized byiAnd feature djThe correlation coefficient is in the range of [ -1,1]The closer the absolute value is to 1, the greater the correlation is, and the closer to 0, the smaller the correlation is; cov (d)i,dj) Is characterized by diAnd feature djThe covariance of (a) is determined,
Figure BDA0003560908820000134
is characterized byiThe standard deviation of the (c) is,
Figure BDA0003560908820000135
is characterized byjStandard deviation of (2).
Setting a minimum redundancy index S (Y) as follows:
Figure BDA0003560908820000136
the maximum correlation minimum redundancy criterion is obtained by combining the two indexes, and the corresponding formula is as follows:
Figure BDA0003560908820000137
in the formula ImRMRRepresenting the maximum correlation minimum redundancy criterion.
And solving the feature set Y meeting the maximum correlation minimum redundancy criterion to obtain the optimal feature set.
The solution of the optimal feature set can be converted into an optimization problem, and a global optimal solution is obtained by adopting a traversal method as an implementation mode in consideration of small initial feature quantity of the power utilization behavior of the user. Let fiFor the set membership indication function, it is 0-1 coded, f i1 indicates that the feature is present in Y, fi0 indicates that the feature d is not present in Yi. To simplify the formulation, the mutual information U (d) isi(ii) a e) And correlation coefficient
Figure BDA0003560908820000138
Respectively using uiAnd vijIs shown as ImRMRThe expression of (a) is:
Figure BDA0003560908820000139
traversing f ═ 0,0, …,0 to f ═ 1,1, …,1) yields let ImRMRAnd decoding the largest f vector to obtain an optimal feature set.
In an implementation manner, the generating a user electricity consumption behavior portrait according to the optimal feature set and the target clustering and partitioning result includes:
and analyzing the optimal feature set of different power consumption behaviors by adopting a grading method, and visually expressing the power consumption characteristics of various users through a radar map and/or visually expressing the comparison of the power consumption characteristics of the users among different classes through a histogram.
According to the embodiment of the invention, the radar map is used for visually expressing the electricity utilization characteristics of various users, and the histogram is used for visually expressing the comparison of the electricity utilization characteristics of the users among different classes, so that business personnel can more accurately and conveniently know the commonalities and individualities of the electricity utilization behaviors of the power users.
Most of the user electricity consumption behavior data are numerical data, and the data can be converted into a label convenient for business personnel to understand through a certain conversion rule. In this embodiment, a scoring system is adopted, the full score is 10, and the power consumption characteristics of each type of user are measured by the score of each label of each type of user. The score per label per user type is given by:
Figure BDA0003560908820000141
in the formula, Ti,jA score for a jth feature for an ith class of users;
Figure BDA0003560908820000142
is the average of the jth characteristics of all users belonging to the ith class; t is tjmax、tjminRespectively, the maximum value and the minimum value of the jth characteristic.
The invention also provides a power consumer behavior representation system.
Referring to fig. 2, fig. 2 is a schematic block diagram of a power consumer behavior representation system according to an embodiment of the present invention.
The embodiment of the invention provides a power consumer behavior representation system, which comprises:
the system comprises a sample set forming module 1, a data processing module and a data processing module, wherein the sample set forming module is used for acquiring power consumer load data, and correcting and normalizing the power consumer load data to form a sample set;
the clustering module 2 is used for clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the inter-class separation degree relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class separation degree;
and the portrait generation module 3 is used for determining an optimal feature set of the user electricity consumption behavior and generating a portrait of the user electricity consumption behavior according to the optimal feature set and the target clustering division result.
In one implementation, the clustering module 2 includes a clustering submodule for clustering the sample set by using a Canopy-K-means algorithm, and the clustering submodule includes:
the pre-clustering unit is used for pre-clustering the sample set through a Canopy algorithm to obtain a plurality of Canopy subsets and the mass center of each Canopy subset;
and the re-clustering unit is used for clustering the sample set by taking the mass center of each Canopy subset as an initial clustering center and adopting a K-means algorithm.
In a practical manner, the prepolymerization unit is specifically used for:
generating a sample list according to the sample set, and respectively determining an initial distance threshold value T according to 80% and 60% of the average value of the samples1、T2And T is1>T2
Randomly selecting a sample point from the sample list as a first Canopy centroid and for the first Canopyy centroids generate a Canopy subset, denoted S0
Randomly selecting one sample point from the rest sample points in the sample list, recording the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T1Then, consider Q as a weakly labeled sample point and put S0If D is less than or equal to T2Then, consider Q as a strongly labeled sample point and put into S0If D is>T1Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are corresponding centroids;
and repeating the third step until the number of elements in the sample list is zero, and outputting the obtained Canopy subset and the centroid thereof.
In an implementation manner, the clustering module 2 includes a calculation sub-module for calculating a cluster validity indicator of each clustering scheme, and the calculation sub-module includes:
a first calculating unit, configured to calculate a first cluster validity indicator according to the following formula:
Figure BDA0003560908820000151
in the formula, TQDIs a first clustering validity index, TQD(i) The distance from the ith intra-class data object in the cluster to the cluster center is shown, and N is the number of the intra-class data objects in the cluster;
a second calculating unit, configured to calculate a second clustering validity indicator according to the following formula:
Figure BDA0003560908820000152
in the formula, TPDFor a second class of validity indicators, QijIs QiAnd QjDistance between cluster centers of (2), QiFor the i-th class object set, QjFor a set of j-th class objects, DiIs QiAverage distance of data object to its cluster center, DjIs QjThe average distance from the data object to the clustering center of the data object, and K is the clustering number;
a third calculating unit, configured to calculate a third aggregation-type validity indicator according to the following formula:
Figure BDA0003560908820000161
wherein the content of the first and second substances,
Figure BDA0003560908820000162
in the formula, TYDAs a third measure of effectiveness, oi、ojThe cluster centers of the ith class and the jth class are respectively, n is the number of samples of the sample set, xjIs sample data, njNumber of samples, δ, for set of class j objectsijIs a Boolean value.
In an implementable manner, the representation generation module 3 comprises a feature determination sub-module for determining an optimal set of features for the user's electricity usage behaviour, the feature determination sub-module comprising:
the system comprises a construction unit, a power utilization unit and a power utilization management unit, wherein the construction unit is used for constructing a user power utilization behavior feature set, and the user power utilization behavior feature set comprises power utilization scale, power utilization category, power utilization time-interval difference, power utilization temperature difference, daily average load stability, daily average power utilization rate, power utilization fluctuation-to-fluctuation ring ratio trend, daily peak-valley difference and working characteristics;
and the characteristic screening unit is used for determining the optimal characteristic set of the user electricity utilization behavior from the user electricity utilization behavior characteristic set according to the maximum correlation minimum redundancy criterion.
In an implementation manner, the sketch generation module 3 includes a generation sub-module configured to generate a sketch of the electrical behavior of the user according to the optimal feature set and the target cluster partitioning result, where the generation sub-module is specifically configured to:
and analyzing the optimal feature set of different power utilization behaviors by adopting a scoring system, and visually expressing the power utilization characteristics of various users through a radar map and/or visually expressing the comparison of the power utilization characteristics of the users among different classes through a histogram.
The invention also provides a power consumer behavior portrait device, which comprises:
a memory to store instructions; wherein the instructions are used for implementing the power consumer behavior representation method according to any one of the above embodiments;
a processor to execute the instructions in the memory.
The invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to implement the power user behavior representation method according to any one of the above embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, and the specific beneficial effects of the systems, apparatuses and modules described above may refer to the corresponding beneficial effects in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (14)

1. A power consumer behavior portrait method is characterized by comprising the following steps:
acquiring power consumer load data, and correcting and normalizing the power consumer load data to form a sample set;
clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the inter-class separation degree relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class separation degree;
and determining an optimal feature set of the power utilization behavior of the user, and generating a user power utilization behavior portrait according to the optimal feature set and the target clustering division result.
2. The power consumer behavior representation method according to claim 1, wherein the clustering the sample set by using a Canopy-K-means algorithm comprises:
pre-clustering the sample set through a Canopy algorithm to obtain a plurality of Canopy subsets and the mass center of each Canopy subset;
and taking the mass center of each Canopy subset as an initial clustering center, and clustering the sample set by adopting a K-means algorithm.
3. The power consumer behavior representation method according to claim 2, wherein the pre-clustering the sample set by using a Canopy algorithm comprises:
generating a sample list according to the sample set, and respectively determining an initial distance threshold value T according to 80% and 60% of the average value of the samples1、T2And T is1>T2
Randomly selecting a sample point from the sample list as a first Canopy centroid, and generating a Canopy subset for the first Canopy centroid, denoted as S0
Randomly selecting one sample point from the rest sample points in the sample list, marking the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T1Then, consider Q as a weakly labeled sample point and put S0If D is less than or equal to T2Then, consider Q as a strongly labeled sample point and put into S0If D is>T1Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are corresponding centroids;
and repeating the third step until the number of elements in the sample list is zero, and outputting the obtained Canopy subset and the centroid thereof.
4. The power consumer behavior representation method according to claim 1, wherein the calculating a cluster effectiveness index for each clustering scheme includes:
calculating a first cluster validity index according to the following formula:
Figure FDA0003560908810000021
in the formula, TQDIs a first clustering validity index, TQD(i) The distance from the ith intra-class data object in the cluster to the cluster center is shown, and N is the number of the intra-class data objects in the cluster;
the second type of effectiveness indicator is calculated according to the following formula:
Figure FDA0003560908810000022
in the formula, TPDFor a second class of validity indicators, QijIs QiAnd QjDistance between cluster centers of (2), QiFor the i-th class object set, QjFor a set of j-th class objects, DiIs QiAverage distance of data object to its cluster center, DjIs QjClustering center of data object toK is the number of clusters;
the third cluster effectiveness index is calculated according to the following formula:
Figure FDA0003560908810000023
wherein
Figure FDA0003560908810000024
In the formula, TYDAs a third measure of effectiveness, oi、ojThe cluster centers of the ith class and the jth class are respectively, n is the number of samples of the sample set, xjIs sample data, njNumber of samples, δ, for set of class j objectsijIs a Boolean value.
5. The method for representing electric power consumer behavior as claimed in claim 1, wherein the step of determining the optimal feature set of the consumer behavior comprises:
constructing a user electricity consumption behavior feature set, wherein the user electricity consumption behavior feature set comprises electricity consumption scale, electricity consumption category, electricity consumption time-interval difference, electricity consumption temperature difference, daily average load stability, daily average electricity utilization rate, electricity consumption fluctuation ring ratio trend, daily peak-valley difference and working features;
and determining the optimal feature set of the user electricity utilization behavior from the user electricity utilization behavior feature set according to the maximum correlation minimum redundancy criterion.
6. The electric power consumer behavior representation method according to claim 5, wherein the generating of the user electricity consumption behavior representation according to the optimal feature set and the target clustering and partitioning result comprises:
and analyzing the optimal feature set of different power consumption behaviors by adopting a grading method, and visually expressing the power consumption characteristics of various users through a radar map and/or visually expressing the comparison of the power consumption characteristics of the users among different classes through a histogram.
7. A power consumer behavior representation system, comprising:
the system comprises a sample set forming module, a data processing module and a data processing module, wherein the sample set forming module is used for acquiring power consumer load data, and correcting and normalizing the power consumer load data to form a sample set;
the clustering module is used for clustering the sample set by adopting a Canopy-K-means algorithm, calculating a clustering effectiveness index of each clustering scheme, determining an optimal clustering number according to the value of the clustering effectiveness index of each clustering scheme, and determining a clustering division result corresponding to the optimal clustering number as a target clustering division result, wherein the clustering effectiveness indexes comprise a first clustering effectiveness index for representing the intra-class compactness, a second clustering effectiveness index for representing the degree of the inter-class separation degree relative to the intra-class compactness and a third clustering effectiveness index for representing the degree of the intra-class compactness relative to the inter-class separation degree;
and the portrait generation module is used for determining an optimal feature set of the power consumption behaviors of the user and generating the portrait of the power consumption behaviors of the user according to the optimal feature set and the target clustering division result.
8. The power consumer behavioral representation system according to claim 7, wherein the clustering module includes a clustering submodule for clustering the sample set using a Canopy-K-means algorithm, the clustering submodule including:
the pre-clustering unit is used for pre-clustering the sample set through a Canopy algorithm to obtain a plurality of Canopy subsets and the mass center of each Canopy subset;
and the re-clustering unit is used for clustering the sample set by taking the mass center of each Canopy subset as an initial clustering center and adopting a K-means algorithm.
9. The power consumer behavior representation system of claim 8, wherein the pre-clustering unit is specifically configured to:
generating a sample list according to the sample set, and respectively determining an initial distance threshold value T according to 80% and 60% of the average value of the samples1、T2And T is1>T2
Randomly selecting a sample point from the sample list as a first Canopy centroid, and generating a Canopy subset for the first Canopy centroid, denoted as S0
Randomly selecting one sample point from the rest sample points in the sample list, recording the sample point as Q, setting the distance from the sample point to the first Canopy centroid as D, and if D is less than or equal to T1Then, consider Q as a weakly labeled sample point and put S0If D is less than or equal to T2Then, consider Q as a strongly labeled sample point and put into S0If D is>T1Generating a new Canopy subset by using Q, and deleting Q from the sample list; the central positions of all the strongly marked sample points in each Canopy subset are the corresponding centroids;
and repeating the third step until the number of elements in the sample list is zero, and outputting the obtained Canopy subset and the centroid thereof.
10. The power consumer behavior representation system of claim 7, wherein the clustering module comprises a computation submodule for computing a cluster validity indicator for each clustering scheme, the computation submodule comprising:
a first calculating unit, configured to calculate a first cluster validity indicator according to the following formula:
Figure FDA0003560908810000041
in the formula, TQDIs a first clustering validity index, TQD(i) The distance from the ith intra-class data object in the cluster to the cluster center is shown, and N is the number of the intra-class data objects in the cluster;
a second calculating unit, configured to calculate a second aggregation effectiveness index according to the following formula:
Figure FDA0003560908810000042
in the formula, TPDFor a second class of validity indicators, QijIs QiAnd QjDistance between cluster centers of (2), QiFor the i-th class object set, QjFor a set of j-th class objects, DiIs QiAverage distance of data object to its cluster center, DjIs QjThe average distance from the data object to the clustering center of the data object, and K is the clustering number;
a third calculating unit, configured to calculate a third aggregation-type validity indicator according to the following formula:
Figure FDA0003560908810000043
wherein
Figure FDA0003560908810000051
In the formula, TYDAs a third measure of effectiveness, oi、ojThe cluster centers of the ith class and the jth class are respectively, n is the number of samples of the sample set, xjIs sample data, njNumber of samples, δ, for set of class j objectsijIs a Boolean value.
11. The power consumer behavior representation system of claim 7, wherein the representation generation module comprises a feature determination sub-module for determining an optimal set of features for consumer electricity usage behavior, the feature determination sub-module comprising:
the system comprises a construction unit, a power utilization unit and a power utilization management unit, wherein the construction unit is used for constructing a user power utilization behavior feature set, and the user power utilization behavior feature set comprises power utilization scale, power utilization category, power utilization time-interval difference, power utilization temperature difference, daily average load stability, daily average power utilization rate, power utilization fluctuation-to-fluctuation ring ratio trend, daily peak-valley difference and working characteristics;
and the characteristic screening unit is used for determining the optimal characteristic set of the user electricity utilization behavior from the user electricity utilization behavior characteristic set according to the maximum correlation minimum redundancy criterion.
12. The power consumer behavior representation system of claim 11, wherein the representation generation module comprises a generation submodule configured to generate a representation of the consumer electrical behavior according to the optimal feature set and the target cluster partitioning result, and the generation submodule is specifically configured to:
and analyzing the optimal feature set of different power utilization behaviors by adopting a grading system, and visually expressing the power utilization characteristics of various users through a radar map and/or visually expressing the comparison of the power utilization characteristics of the users among different classes through a histogram.
13. A power consumer behavior representation device, comprising:
a memory to store instructions; wherein the instructions are used for realizing the power consumer behavior representation method according to any one of claims 1-6;
a processor to execute the instructions in the memory.
14. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the power user behavior representation method according to any one of claims 1 to 6.
CN202210288846.4A 2022-03-23 2022-03-23 Power consumer behavior portrait method, system and device Pending CN114611976A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210288846.4A CN114611976A (en) 2022-03-23 2022-03-23 Power consumer behavior portrait method, system and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210288846.4A CN114611976A (en) 2022-03-23 2022-03-23 Power consumer behavior portrait method, system and device

Publications (1)

Publication Number Publication Date
CN114611976A true CN114611976A (en) 2022-06-10

Family

ID=81865968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210288846.4A Pending CN114611976A (en) 2022-03-23 2022-03-23 Power consumer behavior portrait method, system and device

Country Status (1)

Country Link
CN (1) CN114611976A (en)

Similar Documents

Publication Publication Date Title
Zhu et al. Fast and stable clustering analysis based on Grid-mapping K-means algorithm and new clustering validity index
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
CN112070125A (en) Prediction method of unbalanced data set based on isolated forest learning
CN107230108A (en) The processing method and processing device of business datum
CN110532429B (en) Online user group classification method and device based on clustering and association rules
CN116109195B (en) Performance evaluation method and system based on graph convolution neural network
CN115115265A (en) RFM model-based consumer evaluation method, device and medium
CN113111924A (en) Electric power customer classification method and device
CN116644184B (en) Human resource information management system based on data clustering
Hooshyar et al. Clustering algorithms in an educational context: An automatic comparative approach
Shi et al. Clustering framework based on multi-scale analysis of intraday financial time series
CN114298659A (en) Data processing method and device for evaluation object index and computer equipment
Diao et al. Clustering by Detecting Density Peaks and Assigning Points by Similarity‐First Search Based on Weighted K‐Nearest Neighbors Graph
CN113450141A (en) Intelligent prediction method and device based on electricity selling quantity characteristics of large-power customer groups
CN117034046A (en) Flexible load adjustable potential evaluation method based on ISODATA clustering
CN115051363B (en) Distribution network area user change relation identification method and device and computer storage medium
CN114372835B (en) Comprehensive energy service potential customer identification method, system and computer equipment
Wedashwara et al. Combination of genetic network programming and knapsack problem to support record clustering on distributed databases
CN113705920B (en) Method for generating water data sample set for thermal power plant and terminal equipment
CN114611976A (en) Power consumer behavior portrait method, system and device
CN109241146A (en) Student's intelligence aid method and system under cluster environment
CN114091961A (en) Power enterprise supplier evaluation method based on semi-supervised SVM
WO1992017853A2 (en) Direct data base analysis, forecasting and diagnosis method
Rong et al. Exploring network behavior using cluster analysis
Ding et al. Time-varying Gaussian Markov random fields learning for multivariate time series clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination