CN113837778A - User complaint clustering analysis method based on improved wolf pack optimization K-means - Google Patents

User complaint clustering analysis method based on improved wolf pack optimization K-means Download PDF

Info

Publication number
CN113837778A
CN113837778A CN202111260294.8A CN202111260294A CN113837778A CN 113837778 A CN113837778 A CN 113837778A CN 202111260294 A CN202111260294 A CN 202111260294A CN 113837778 A CN113837778 A CN 113837778A
Authority
CN
China
Prior art keywords
wolf
data
power
information data
improved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111260294.8A
Other languages
Chinese (zh)
Inventor
郑健
杨威
王长春
姜丹丹
韩广忠
杨佳钰
张依娇
杨祺
夏力鹏
郭天超
杨勇
白日明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Fuxin Electric Power Supply Co
State Grid Corp of China SGCC
Original Assignee
State Grid Fuxin Electric Power Supply Co
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Fuxin Electric Power Supply Co, State Grid Corp of China SGCC filed Critical State Grid Fuxin Electric Power Supply Co
Priority to CN202111260294.8A priority Critical patent/CN113837778A/en
Publication of CN113837778A publication Critical patent/CN113837778A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/015Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
    • G06Q30/016After-sales
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Abstract

The invention relates to a user complaint cluster analysis method based on improved wolf pack optimization K-means, which belongs to the technical field of data analysis and comprises the following steps of collecting power consumption information data of power users; complementing the user power utilization information data through a data preprocessing method; reducing the dimension of the power consumption information data of the power consumer based on an improved wolf pack optimization K-means clustering algorithm; selecting a power utilization information data characteristic vector of a power consumer; the method is characterized in that the concentration factor of the power utilization information data of the power consumer is judged, a group of central points enabling the clustering objective function value to be minimum are searched by utilizing the strong global searching capability characteristic of the wolf colony algorithm, the excessive dependence of original K-means clustering on the initial central points is weakened to a certain extent, in order to carry out more accurate analysis on the complaint risk clustering of the power consumer, the wolf colony algorithm of an improved searching strategy is provided in 3 searching stages of executing the wolf colony algorithm, the accurate clustering center can be obtained, and the clustering accuracy is improved.

Description

User complaint clustering analysis method based on improved wolf pack optimization K-means
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to a user complaint cluster analysis method based on improved wolf pack optimization K-means.
Background
Mature information storage and data storage technologies are introduced in informatization construction of electric power enterprises in China, and client service systems of companies accumulate massive and detailed business data and have the characteristics of high capacity, diversification and instantaneity. In actual work, a company extracts some statistical tables from customer service data by adopting a traditional data processing mode, such as indexes of manual service rate, satisfaction rate and the like, but business rules hidden in the data are difficult to find, and a mathematical model for describing business characteristics is difficult to abstract. Customer service relies on negotiation, feedback, vocalization and complaints, not being limited to absolute scenarios. The complaints and complaints of the customers to the services directly reflect the dissatisfaction and the most urgent requirements of the customers to the services, the complaint treatment can reflect the possible problems in the business capability and management of enterprise employees, a large amount of business worksheet data are accumulated in a customer service system of an electric power company, the requirements and service expectations of the customers to the businesses are hidden, and the complaint system has guiding significance for the business promotion and management of the enterprises.
The classification of the complaint risk level of the power consumer is a clustering analysis problem in nature and can be realized by using a clustering algorithm. The K-means has the advantages of simple and quick algorithm and capability of effectively processing a large data set, and can realize quick and efficient classification of the complaint risk level of the power consumer. By combining different characteristics of the user information data, the classification of the complaint behaviors of the power users is successfully realized. The traditional K-means algorithm randomly selects an initial clustering center, and the randomness can greatly influence the clustering result.
Disclosure of Invention
Aiming at the problem of grading the complaint risk of the power consumer, the invention provides a K-means clustering method based on an improved wolf colony in order to improve the clustering accuracy and stability of a K-means algorithm and solve the problem of the optimal clustering center of the algorithm.
In order to achieve the purpose, the invention adopts the technical scheme that: a user complaint cluster analysis method based on improved wolf pack optimization K-means comprises the following steps:
step S11: collecting power consumption information data of power consumers;
step S12: complementing the user power utilization information data through a data preprocessing method;
step S13: reducing the dimension of the electricity information data of the power consumer based on the clustering algorithm of the improved wolf pack optimization K-means;
step S14: selecting a power utilization information data characteristic vector of a power consumer;
step S15: and (4) judging the power utilization information data concentration factor molecules of the power consumers.
Further, the clustering algorithm based on the improved wolf pack optimization K-means comprises the following steps:
step S21: initializing wolf cluster, setting artificial wolf position XiIteration number k, sounding wolf scale factor alpha, wandering number TmaxAnd clustering the number N, calculating the fitness function of the wolf group and selecting the current optimal solution XbestThe best S artificial wolf except the head wolf is the exploring wolf;
step S22: executing interactive wandering action until the odor concentration Y of the prey detected by a certain wolf detection iiGreater than the concentration Y of the prey odor sensed by the wolf headleadOr a maximum number of wandering times T is reachedmax
Step S23: rushing towards prey by wolf according to interactive calling behavior, and sensing smell concentration Y of prey on the wayi>YleadThen Y islead=YiReplacing the wolf to initiate a call behavior;
step S24: updating the position of the wolf of terrible, and executing a containment action;
step S25: updating the position of the wolf head according to the wolf head generation rule that the winner is the king, then updating the group according to the wolf group updating mechanism of 'survival of the strong person', and calculating a new cluster center according to the latest position of the improved wolf group;
step S26: ending when the ending condition is reached; otherwise, return to step S23.
Further, in step S11, performing normalization processing on the data by using maximum normalization, and normalizing the value to the [0,1] interval;
the data normalization formula shows:
Figure BDA0003325432370000021
wherein, X represents the electricity utilization information data of the user; xminAnd XmaxThe minimum value and the maximum value of certain user information data are respectively accorded with.
Further, in step S12, a simple deletion process is performed on data having a deletion rate greater than 30%; for data with the deletion rate less than or equal to 30%, filling the electricity utilization information data by using an interpolation method, and obtaining a polynomial function L (x) according to the existing data, wherein the Lagrange interpolation polynomial is as follows:
Figure BDA0003325432370000031
and then, substituting the point corresponding to the missing value into the interpolation polynomial to obtain an approximate value L (x) of the missing value, and completing the data.
Further, in step S13, before performing the cluster analysis, a principal component analysis method is used to perform a dimensionality reduction on the influencing factors influencing the power customers, and a principal component analysis is performed on the power consumption, the voltage level, the complaint times, the age, the gender, the total power consumption, and the illegal power consumption, where:
the variance contribution rate of the ith principal component is:
Figure BDA0003325432370000032
the cumulative variance contribution of the first i principal components is:
Figure BDA0003325432370000033
wherein the variance contribution rate alpha of the principal componentiThe larger the value of (a), the stronger the correlation with the sample.
Further, in step S14, the power consumer data set is subjected to feature extraction: the correlation coefficient is used for expressing the attribute correlation degree between the two, and the correlation coefficient R can be obtaineda
Figure BDA0003325432370000034
Wherein σx、σyThe variance of X, Y, E (X), E (Y) tableExpected value, R, of X, YaRepresenting the corresponding degree of correlation and the correlation coefficient R of each power consumption information dataaThe larger, the greater the impact on complaint risk; in step S15, the null hypothesis is rejected on the premise that the significance level is 0.05, the initial KMO test value meets and exceeds the critical value of 0.5, and the model data conforms to the factor analysis method.
Further, in step S22, the wolf explores in n directions, the higher n is, the higher the optimizing accuracy is, in order to increase the interactivity between wolfs and improve the optimizing capability, the search method is:
Figure BDA0003325432370000041
wherein: y isi,dIndicating the location of the update of the prey,
Figure BDA0003325432370000042
representing the optimum solution, x, sought within the rangei,dIndicating the position of the original prey, alphai,dIs [0,1]]Random number of (2), betai,dIs [ -1,1 [ ]]The random number of (1), k ≠ i ≠ j.
Further, in the step S23: selecting better clustering center point YiGo forward in the direction of (1), update the wolf location XiSelecting the wolf at the best clustering center point position as the head wolf.
Further, in step S24, as the iteration number t of the algorithm increases, the adaptive step size that changes linearly is expressed as:
Figure BDA0003325432370000043
wherein:
Figure BDA0003325432370000044
represents the position of the wolf of the population of the k +1 generation,
Figure BDA0003325432370000045
represents the kth generation populationThe position of the head wolf is determined,
Figure BDA0003325432370000046
the position of the k-th generation group wolf in the d-dimensional space, theta is a factor and is taken as (0,1) internal random number: w is a random integer within { -1,1}
Compared with the prior art, the invention has the outstanding and beneficial technical effects that: the invention provides a K-Means algorithm based on improved wolf pack optimization aiming at the problem of power consumer grade division, and the best clustering center of a data set is obtained through the improved wolf pack algorithm and is used as the initial clustering center of the K-Means, so that the problems that in the traditional K-Means algorithm implementation step, the initial center is easy to cause local optimization of the clustering effect, the algorithm is unstable, and the clustering accuracy is reduced are solved. The invention provides the wolf pack algorithm for improving the search strategy, which can reduce the instability of the initial clustering center selection randomness to the clustering result and ensure the accuracy of clustering, thereby improving the stability of clustering.
Drawings
FIG. 1 is a flow chart of K-menas for the improved wolf pack of the present invention;
FIG. 2 is a diagram of the clustering result of the power consumers according to the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following description, with reference to the drawings in the embodiments of the present invention, clearly and completely describes the technical solution in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to grade complaint risks of power consumption customers, extract power consumption characteristics of different users, namely, identify the power consumption characteristics of different customer groups according to personal information of the users of various types and the power consumption characteristics in different time periods by using collected information data, and perform data dimension reduction processing on a power consumption data set.
The invention provides a user complaint cluster analysis method based on improved wolf pack optimization K-means, which is characterized by comprising the following steps:
step S11: collecting power consumption information data of power consumers; since the classification of the power consumers is mainly to divide typical complaint characteristics of the power consumers and is not to simply divide the complaint characteristics by numerical values, normalization processing is necessary, normalization processing is performed on data by adopting maximum normalization, and the numerical values are normalized to a [0,1] interval;
the data normalization formula shows:
Figure BDA0003325432370000051
wherein, X represents the electricity utilization information data of the user; xminAnd XmaxThe minimum value and the maximum value of certain user information data are respectively accorded with.
In the embodiment, partial power consumption data of the power consumption acquisition system in the northeast region are analyzed and extracted, and the acquisition time interval is 1 month. Data preparation of power consumer clustering, data is prepared from the following 2 aspects:
user profile information, such as age, electricity usage category, industry category, etc.;
the user electricity consumption information, such as electricity consumption data.
Step S12: the invention discloses an advanced data preprocessing method, which comprises the steps of carrying out missing value analysis and abnormal value analysis on user electricity information data, analyzing the rule and abnormal value of the data, observing the data to find that the dimension of the collected data is too large and the data is not easy to analyze, and extracting user complaint characteristics from the data and extracting some key indexes reflecting the essence of the data so as to achieve the purpose of clustering and improve the accuracy of clustering.
And then, selecting characteristic vectors, wherein the complaint characteristics of each user are influenced by the electricity consumption data differently, the application range of each information data index is not fixed, each information data index has a side focus, and the selection of the information data has great influence on the complaint characteristics of the user.
The user complaint behavior characteristic analysis mainly comprises the following steps:
step S121: and selectively extracting historical user electricity consumption data and user profile information data from the electricity consumption collection data source.
Step S122: and performing data association, data exploration analysis and preprocessing on the two data sets in the step S121, wherein the data association, the data exploration analysis and the preprocessing comprise the exploration analysis of data missing values and abnormal values, the cleaning and the transformation of data and the like.
Step S123: and (4) grading the user complaint characteristics by using the modeling data which is formed in the step (S122) and is subjected to the data preprocessing and a clustering algorithm, performing characteristic analysis on each power consumption customer group, and identifying the user characteristics of each grade.
And then, normalization processing is carried out, the value ranges of the original data can have larger difference, and if the original data are directly processed, the characteristics with large values can possibly annihilate the characteristics with small values, so that the data with small values cannot be effectively analyzed. Meanwhile, since the classification of the power consumer is mainly a classification of typical complaint characteristics of the power consumer, and is not simply a classification with a numerical quantity, a normalization process is necessary. In the experiment, the data is normalized by the maximum value, and the value is normalized to the interval of [0,1 ].
The formula for data normalization is:
Figure BDA0003325432370000061
wherein: x represents the electricity utilization information data of the user; xminAnd XmaxThe minimum value and the maximum value of certain user information data are respectively accorded with.
And then, data cleaning is carried out, wherein the data cleaning mainly comprises the steps of deleting invalid data in historical complaint data of the power consumer and user basic data and supplementing the missing data. The data with most of all rows and all columns being empty and the characteristics of high loss rate, numerical value and small standard deviation are removed, when a certain user only has personal information data but no data of power consumption, the data can be used as garbage data, and the data records are stipulated.
Finally, complementing the user electricity utilization information data through a data preprocessing method; for data with deletion rate more than 30%, simple deletion processing is carried out; for data with the missing rate of less than or equal to 30%, because the user information data has certain trend characteristics, the interpolation method can enable the interpolation value to be more accurate, so the interpolation method is adopted for filling the electricity information data, and a polynomial function L (x) is obtained according to the existing data, wherein the polynomial function L (x) is obtained by the following method:
calculating an n-1 degree polynomial for the known n points:
y=a0+a1x+a2x2+…+an-1xn-1
coordinate (x) of n points1,y1),(x2,y2)…(xn,yn) A human-substituting polynomial function to obtain
Figure BDA0003325432370000071
Figure BDA0003325432370000072
The lagrange interpolation polynomial available is:
Figure BDA0003325432370000073
and then, substituting the point corresponding to the missing value into the interpolation polynomial to obtain an approximate value L (x) of the missing value, and completing the data.
The main characteristics of the power consumption user clustering comprise the types of complaint events of users, the complaint number, the age of the users, the addresses of the users, the annual power consumption and the monthly power consumption, and also comprise information such as whether complaint acceptance contents are reasonable and the return visit satisfaction of the users, but the influence of the factors on the final clustering effect is superposed, the characteristics need to be subjected to dimension reduction processing, and the clustering accuracy is improved.
Step S13: reducing the dimension of the power consumption information data of the power consumer; before clustering analysis is carried out, a principal component analysis method is adopted to carry out dimensionality reduction on influencing factors influencing power customers, principal component analysis is carried out on power consumption, voltage levels, complaint times, ages, sexes, total power consumption and illegal power consumption, principal component analysis keeps main information of original data, clustering accuracy is further improved, a plurality of mutually-dependent data sets are converted into mutually-independent data sets by the aid of the dimensionality reduction method, and the mutually-independent data sets are the principal components of the original mutually-dependent data sets. The principal components are obtained by linearly combining the original data, and are independent from each other, so that the data characteristics of the original data can be kept in the principal components, and the independence of the principal components is also ensured.
M variables are observed for the original data, and the original data matrix of n samples is shown as the formula.
Figure BDA0003325432370000081
Wherein x is1,x2,…,xmComplaint characteristics, x, representing the ith useri={xi1,xi2,…,ximA sample amount representing the type of complaint event for the user, the complaint quantity … monthly electricity usage.
And in view of different maximum values of data in each user information, carrying out standardization processing on the user electricity utilization information data to obtain a data matrix for carrying out cluster analysis.
Each column x of the sample matrixjThe average value of (a) is:
Figure BDA0003325432370000082
the variance of the sample matrix X is:
Figure BDA0003325432370000083
the formula for data normalization is as follows:
Figure BDA0003325432370000084
normalizing the elements in the sample matrix to form a normalized matrix, the covariance matrix Y of matrix X, i.e., the correlation coefficient matrix, is given by the formula:
Figure BDA0003325432370000085
for an orthogonal matrix U, there is UTYU ═ Λ, where Λ ═ λ (λ)12,…λm) And λ12…>λmHaving a of12,…αmIs about12,…λmThe feature vector of (2). Random variable y1,y2,…ymIs not correlated and the eigenvalue is lambdaiYi is called the ith principal component of the sample matrix x.
Wherein:
the variance contribution rate of the ith principal component is:
Figure BDA0003325432370000086
the cumulative variance contribution of the first i principal components is:
Figure BDA0003325432370000091
variance contribution rate α of principal componentiThe larger the value of (d), the stronger the correlation with the sample, and the ordering of the principal component components is arranged in descending order of the size of the feature root. In practical application, the number of extracted principal componentsThe quantity depends on the cumulative variance contribution ratio betai
PCA converts the above variables into synthetic variables as shown in the formula.
F1=a11x1+a12x2+…+a1mxm
F2=a21x1+a22x2+…+a2mxm
Fp=ap1x1+ap2x2+…+ammxm
The abbreviation is shown as the following formula:
Fj=aj1x1+aj2x2+…+ajmxm j=1.2…m
on which the following are satisfied:
1)F1,F2,…Fjare independent of each other;
2)F1,F2,…Fjthe variance is gradually decreased one by one, and F is defined1Is a first main component, F2Is the second principal component, and so on, ak1 2+ak2 2+ak3 2+···+akp 21, k-1, 2, p, wherein aijIs the main coefficient.
As shown in fig. 1, the present invention adopts the wolf pack algorithm of the improved search strategy by improving the flow chart of the wolf pack k-means algorithm, which includes: interactive walking behavior, interactive calling behavior and self-adaptive attack behavior.
The invention relates to a clustering algorithm based on improved wolf pack optimization K-means, which comprises the following steps:
step S21: initializing wolf cluster, setting artificial wolf position XiIteration number k, sounding wolf scale factor alpha, wandering number TmaxAnd clustering the number N, calculating the fitness function of the wolf group and selecting the current optimal solution XbestThe best S artificial wolf except the head wolf is the exploring wolf;
step S22: performing interactionsWandering until the odor concentration Y of the prey detected by a specific wolf-exploring iiGreater than the concentration Y of the prey odor sensed by the wolf headleadOr a maximum number of wandering times T is reachedmax
Step S23: rushing towards prey by wolf according to interactive calling behavior, and sensing smell concentration Y of prey on the wayi>YleadThen Y islead=YiReplacing the wolf to initiate a call behavior;
step S24: updating the position of the wolf of terrible, and executing a containment action;
step S25: updating the position of the wolf head according to the wolf head generation rule that the winner is the king, then updating the group according to the wolf group updating mechanism of 'survival of the strong person', and calculating a new cluster center according to the latest position of the improved wolf group;
step S26: end when the end condition (optimal position or maximum iteration number) is reached; otherwise, return to step S23.
In the interactive walking behavior of the method, the exploring wolf is explored towards n directions, the higher n is, the higher the optimizing precision is, but the optimizing speed of the algorithm is reduced, and the optimal clustering center point is easy to fall into local optimization; n is too small, which causes inaccuracy of the clustering center point and even the condition that the clustering center point cannot be searched. The reason for this is that the wolfs are lack of necessary information interaction, and the information of the "fellow" cannot be known in time, which affects the global search ability of the wolfs. In order to increase the interactivity between the wolfs and improve the optimizing capability, the searching method is as follows:
Figure BDA0003325432370000101
wherein: y isi,dIndicating the location of the update of the prey,
Figure BDA0003325432370000102
representing the optimum solution, x, sought within the rangei,dIndicating the position of the original prey, alphai,dIs [0,1]]Random number of (2), betai,dIs [ -1,1 [ ]]The random number of (1), k ≠ i ≠ j. Middle and front half section of the formulaThe method has the advantages that the local optimizing capability of the wolf cluster is enhanced, the global searching capability of the wolf cluster is enhanced in the second half, the global searching capability and the local optimizing capability of the wolf cluster are well balanced, the leading capability of the wolf head wolf of the wolf cluster is embodied, the close communication of information among the wolf clusters is kept, and the searched clustering center point is more accurate.
In the interactive calling behavior of the method, the wolf is rushed continuously until di<dnearThe basic calling behavior can enable the wolf to comprehensively explore a search space, but the algorithm is too complex and is easy to fall into a local optimal clustering center point. Therefore, the invention adopts the calling strategy that the russian wolf can carry out 'containment' on the sought clustering center point once. In the group algorithm, the communication among groups is an important ring of the algorithm, and a better clustering center point Y is selectediGo forward in the direction of (1), update the wolf location XiSelecting the wolf at the best clustering center point position as the head wolf.
In the self-adaptive attack behavior of the method, the attack behavior requires that the wolf has strong local optimization capability. The method has randomness and uncertainty, along with the continuous evolution of the algorithm, the current optimal solution is closer to the global optimal solution, the wolf exploitation capability is stronger, the algorithm is enabled to quickly converge the global optimal solution, an adjusting mechanism is added into the algorithm, the method is a better improvement direction, in order to enable the attack behavior to have self-adaptive adjusting capability, the random step length lambda is changed into the self-adaptive step length which is linearly changed along with the increase of the iteration times t of the algorithm, and the formula is as follows:
Figure BDA0003325432370000111
wherein:
Figure BDA0003325432370000112
represents the position of the wolf of the population of the k +1 generation,
Figure BDA0003325432370000113
indicating the position of the wolf of the kth generation,
Figure BDA0003325432370000114
the position of the k-th generation group wolf in the d-dimensional space, theta is a factor and is taken as (0,1) internal random number: w is a random integer within { -1,1 }. The purpose of taking theta in (0,1) is to ensure that w (1-theta t/t) is avoided at the later stage of algorithm iterationmax) Approaching to zero, resulting in no change in optimization; w is to ensure that the search range is not limited
Figure BDA0003325432370000115
Can search for the area near xid more comprehensively. If the prey odor concentration perceived by the artificial wolf is larger than the prey odor concentration perceived by the home position state after the attack behavior is implemented, updating the position of the artificial wolf; otherwise, the position of the artificial wolf is not changed.
By adopting the method, the effective classification of the grades of the complaint users can be realized, the method not only reduces the number of variables used in clustering, but also retains important information contained in original variables, simplifies practical application and operation and improves clustering precision. As shown in fig. 2, the present invention can divide power consumers into high-risk consumers, low-risk consumers, general-risk consumers, good consumers, and bad consumers. This classification method helps the utility company classify the customers. On the basis, corresponding risk prevention strategies can be formulated for partial users by combining credit grades and the like, an early warning platform is constructed, electric power companies can be prevented from defaulting electric charges and stealing electricity, and huge economic benefits are brought.
Step S14: selecting a power utilization information data characteristic vector of a power consumer; performing feature extraction on the power user data set: the correlation coefficient is used for expressing the attribute correlation degree between the two, and the correlation coefficient R can be obtaineda
Figure BDA0003325432370000116
Wherein σx、σyThe variance of X, Y, E (X), E (Y) the expected value of X, Y, RaRepresenting the degree of correlation, R, corresponding to each electricity consumption information dataaRepresenting the corresponding degree of correlation and the correlation coefficient R of each power consumption information dataaThe larger the impact on complaint risk.
Step S15: judging the concentration factor molecules of the power utilization information data of the power consumers; in step S15, a null hypothesis is removed on the premise that the significance level is 0.05, which indicates that the initial data has internal correlation and information redundancy, and is suitable for data processing and statistical analysis using a factor analysis method. Meeting and exceeding the critical value of 0.5 means that the factor analysis is suitable and the model data conforms to the factor analysis method. The total variation of the initial data is preserved. However, the effect of the concentration of factors is also eliminated. The main problem of factor analysis is how to maintain the information interpretation capability of the initial data on the premise of factor concentration. In selecting the scale of elements, it is desirable to balance element concentration and information retention.
The above embodiments are only preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, so: all equivalent changes made according to the structure, shape and principle of the invention are covered by the protection scope of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Claims (9)

1. A user complaint cluster analysis method based on improved wolf pack optimization K-means is characterized by comprising the following steps:
step S11: collecting power consumption information data of power consumers;
step S12: complementing the user power utilization information data through a data preprocessing method;
step S13: reducing the dimension of the electricity information data of the power consumer based on the clustering algorithm of the improved wolf pack optimization K-means;
step S14: selecting a power utilization information data characteristic vector of a power consumer;
step S15: and (4) judging the power utilization information data concentration factor molecules of the power consumers.
2. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, wherein the improved wolf pack optimization K-means-based clustering algorithm comprises the following steps:
step S21: initializing wolf cluster, setting artificial wolf position XiIteration number k, sounding wolf scale factor alpha, wandering number TmaxAnd clustering the number N, calculating the fitness function of the wolf group and selecting the current optimal solution XbestThe best S artificial wolf except the head wolf is the exploring wolf;
step S22: executing interactive wandering action until the odor concentration Y of the prey detected by a certain wolf detection iiGreater than the concentration Y of the prey odor sensed by the wolf headleadOr a maximum number of wandering times T is reachedmax
Step S23: rushing towards prey by wolf according to interactive calling behavior, and sensing smell concentration Y of prey on the wayi>YleadThen Y islead=YiReplacing the wolf to initiate a call behavior;
step S24: updating the position of the wolf of terrible, and executing a containment action;
step S25: updating the position of the wolf head according to the wolf head generation rule that the winner is the king, then updating the group according to the wolf group updating mechanism of 'survival of the strong person', and calculating a new cluster center according to the latest position of the improved wolf group;
step S26: ending when the ending condition is reached; otherwise, return to step S23.
3. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, characterized in that: in step S11, performing normalization processing on the data by using maximum normalization, and normalizing the value to a [0,1] interval;
the data normalization formula shows:
Figure FDA0003325432360000011
wherein, X represents the electricity utilization information data of the user; xminAnd XmaxThe minimum value and the maximum value of certain user information data are respectively accorded with.
4. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, characterized in that: in step S12, a simple deletion process is performed on data having a deletion rate greater than 30%; for data with the deletion rate less than or equal to 30%, filling the electricity utilization information data by using an interpolation method, and obtaining a polynomial function L (x) according to the existing data, wherein the Lagrange interpolation polynomial is as follows:
Figure FDA0003325432360000021
and then, substituting the point corresponding to the missing value into the interpolation polynomial to obtain an approximate value L (x) of the missing value, and completing the data.
5. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, characterized in that: in step S13, before performing cluster analysis, a principal component analysis method is used to perform dimensionality reduction on the influence factors affecting the power customers, and principal component analysis is performed on the power consumption, the voltage class, the complaint times, the age, the gender, the total power consumption, and the illegal power consumption, where:
the variance contribution rate of the ith principal component is:
Figure FDA0003325432360000022
the cumulative variance contribution of the first i principal components is:
Figure FDA0003325432360000023
wherein the variance contribution rate alpha of the principal componentiThe larger the value of (a), the stronger the correlation with the sample.
6. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, wherein in step S14, the power user data set is subjected to feature extraction: the correlation coefficient is used for expressing the attribute correlation degree between the two, and the correlation coefficient R can be obtaineda
Figure FDA0003325432360000024
Wherein σx、σyThe variance of X, Y, E (X), E (Y) the expected value of X, Y, RaRepresenting the corresponding degree of correlation and the correlation coefficient R of each power consumption information dataaThe larger, the greater the impact on complaint risk; in step S15, the null hypothesis is rejected on the premise that the significance level is 0.05, the initial KMO test value meets and exceeds the critical value of 0.5, and the model data conforms to the factor analysis method.
7. The method for user complaint cluster analysis based on improved wolf pack optimization K-means as claimed in claim 2, wherein in step S22, the wolf explorers explore in n directions, the higher n is, the higher the optimizing accuracy is, in order to increase the interactivity between wolfs explorers and improve the optimizing capability, the searching method is:
Figure FDA0003325432360000031
wherein: y isi,dIndicating the location of the update of the prey,
Figure FDA0003325432360000032
representing the optimum solution, x, sought within the rangei,dIndicating the position of the original prey, alphai,dIs [0,1]]Random number of (2), betai,dIs [ -1,1 [ ]]The random number of (1), k ≠ i ≠ j.
8. The improved wolf pack optimization K-means based user complaint cluster analysis method as claimed in claim 2, wherein in the step S23: selecting better clustering center point YiGo forward in the direction of (1), update the wolf location XiSelecting the wolf at the best clustering center point position as the head wolf.
9. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 2, wherein in step S24, as the iteration number t of the algorithm increases, the adaptive step length of the linear change is expressed as:
Figure FDA0003325432360000033
wherein:
Figure FDA0003325432360000034
represents the position of the wolf of the population of the k +1 generation,
Figure FDA0003325432360000035
indicating the position of the wolf of the kth generation,
Figure FDA0003325432360000036
the position of the k-th generation group wolf in the d-dimensional space, theta is a factor and is taken as (0,1) internal random number: w is a random integer within { -1,1 }.
CN202111260294.8A 2021-10-28 2021-10-28 User complaint clustering analysis method based on improved wolf pack optimization K-means Pending CN113837778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111260294.8A CN113837778A (en) 2021-10-28 2021-10-28 User complaint clustering analysis method based on improved wolf pack optimization K-means

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111260294.8A CN113837778A (en) 2021-10-28 2021-10-28 User complaint clustering analysis method based on improved wolf pack optimization K-means

Publications (1)

Publication Number Publication Date
CN113837778A true CN113837778A (en) 2021-12-24

Family

ID=78966172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111260294.8A Pending CN113837778A (en) 2021-10-28 2021-10-28 User complaint clustering analysis method based on improved wolf pack optimization K-means

Country Status (1)

Country Link
CN (1) CN113837778A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554241A (en) * 2021-09-02 2021-10-26 国网山东省电力公司泰安供电公司 User layering method and prediction method based on user electricity complaint behaviors
CN114897451A (en) * 2022-07-13 2022-08-12 南昌工程学院 Double-layer clustering correction method and device considering key features of demand response user

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102625344A (en) * 2012-03-13 2012-08-01 重庆信科设计有限公司 Model and method for evaluating user experience quality of mobile terminal
CN107507076A (en) * 2017-09-26 2017-12-22 贵州电网有限责任公司 The method of the composite rating of power customer based on data mining
WO2019214133A1 (en) * 2018-05-08 2019-11-14 华南理工大学 Method for automatically categorizing large-scale customer complaint data
AU2020101080A4 (en) * 2020-06-23 2020-07-23 A, Clementking DR Customer Retention System for Retail Enterprises using Multi-phase Clustering Data Mining Technique
CN111626543A (en) * 2020-04-03 2020-09-04 国网浙江杭州市富阳区供电有限公司 Method and device for processing power related data
CN112257778A (en) * 2020-10-22 2021-01-22 国网浙江省电力有限公司台州供电公司 Two-stage refined clustering method based on user electricity consumption behavior
CN112464059A (en) * 2020-12-08 2021-03-09 深圳供电局有限公司 Power distribution network user classification method and device, computer equipment and storage medium
CN112507231A (en) * 2020-12-17 2021-03-16 辽宁工程技术大学 GWO-FCM-based personalized recommendation method
CN113239503A (en) * 2021-05-10 2021-08-10 上海电气工程设计有限公司 New energy output scene analysis method and system based on improved k-means clustering algorithm

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102625344A (en) * 2012-03-13 2012-08-01 重庆信科设计有限公司 Model and method for evaluating user experience quality of mobile terminal
CN107507076A (en) * 2017-09-26 2017-12-22 贵州电网有限责任公司 The method of the composite rating of power customer based on data mining
WO2019214133A1 (en) * 2018-05-08 2019-11-14 华南理工大学 Method for automatically categorizing large-scale customer complaint data
CN111626543A (en) * 2020-04-03 2020-09-04 国网浙江杭州市富阳区供电有限公司 Method and device for processing power related data
AU2020101080A4 (en) * 2020-06-23 2020-07-23 A, Clementking DR Customer Retention System for Retail Enterprises using Multi-phase Clustering Data Mining Technique
CN112257778A (en) * 2020-10-22 2021-01-22 国网浙江省电力有限公司台州供电公司 Two-stage refined clustering method based on user electricity consumption behavior
CN112464059A (en) * 2020-12-08 2021-03-09 深圳供电局有限公司 Power distribution network user classification method and device, computer equipment and storage medium
CN112507231A (en) * 2020-12-17 2021-03-16 辽宁工程技术大学 GWO-FCM-based personalized recommendation method
CN113239503A (en) * 2021-05-10 2021-08-10 上海电气工程设计有限公司 New energy output scene analysis method and system based on improved k-means clustering algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙杰: "灰狼算法优化及其协同过滤推荐应用研究", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 7, pages 2 - 59 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554241A (en) * 2021-09-02 2021-10-26 国网山东省电力公司泰安供电公司 User layering method and prediction method based on user electricity complaint behaviors
CN113554241B (en) * 2021-09-02 2024-04-26 国网山东省电力公司泰安供电公司 User layering method and prediction method based on user electricity complaint behaviors
CN114897451A (en) * 2022-07-13 2022-08-12 南昌工程学院 Double-layer clustering correction method and device considering key features of demand response user
CN114897451B (en) * 2022-07-13 2022-09-13 南昌工程学院 Double-layer clustering correction method and device considering key features of demand response user

Similar Documents

Publication Publication Date Title
Majhi et al. Fuzzy clustering using salp swarm algorithm for automobile insurance fraud detection
CN113837778A (en) User complaint clustering analysis method based on improved wolf pack optimization K-means
CN110119948B (en) Power consumer credit evaluation method and system based on time-varying weight dynamic combination
WO2022156328A1 (en) Restful-type web service clustering method fusing service cooperation relationships
CN110826618A (en) Personal credit risk assessment method based on random forest
CN110852856A (en) Invoice false invoice identification method based on dynamic network representation
Xu et al. Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode
CN115115265A (en) RFM model-based consumer evaluation method, device and medium
CN105205163A (en) Incremental learning multi-level binary-classification method of scientific news
Lu et al. Adaptive weighted fuzzy clustering algorithm for load profiling of smart grid customers
CN113392877B (en) Daily load curve clustering method based on ant colony algorithm and C-K algorithm
CN114547446A (en) Order pushing method and system
Zhong et al. Legal supervision mechanism of recommendation algorithm based on intelligent data recognition
Mohammed et al. Extractive Multi-Document Summarization Model Based on Different Integrations of Double Similarity Measures
Kuo et al. Analyze influence factors in customer’s insurance transaction by decision tree model
Yang et al. Green credit product design based on fuzzy concept lattice
Zhou et al. Bank Customer Classification Algorithm Based on Improved Decision Tree
Xin et al. Research on Power User Behavior Analysis and Prediction Based on RFM-Random Forest Algorithm
CN112836926B (en) Enterprise operation condition evaluation method based on electric power big data
CN116501770B (en) User data acquisition method and system based on fuzzy algorithm
Zhang et al. Business Analysis in Modeling of Financial Risk
Liu et al. Automatic classification method of power user’s requirements text based on parallel naive Bayesian algorithm
Wu Research on Improvement Strategy of Clustering Algorithm Based on Density Parameter Optimization Algorithm
CN110991767B (en) Leading user identification and prediction method and technical trend prediction method
Li Using machine learning forecasts movie revenue

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination