CN113837778A

CN113837778A - User complaint clustering analysis method based on improved wolf pack optimization K-means

Info

Publication number: CN113837778A
Application number: CN202111260294.8A
Authority: CN
Inventors: 郑健; 杨威; 王长春; 姜丹丹; 韩广忠; 杨佳钰; 张依娇; 杨祺; 夏力鹏; 郭天超; 杨勇; 白日明
Original assignee: State Grid Fuxin Electric Power Supply Co; State Grid Corp of China SGCC
Current assignee: State Grid Fuxin Electric Power Supply Co; State Grid Corp of China SGCC
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2021-12-24

Abstract

The invention relates to a user complaint cluster analysis method based on improved wolf pack optimization K-means, which belongs to the technical field of data analysis and comprises the following steps of collecting power consumption information data of power users; complementing the user power utilization information data through a data preprocessing method; reducing the dimension of the power consumption information data of the power consumer based on an improved wolf pack optimization K-means clustering algorithm; selecting a power utilization information data characteristic vector of a power consumer; the method is characterized in that the concentration factor of the power utilization information data of the power consumer is judged, a group of central points enabling the clustering objective function value to be minimum are searched by utilizing the strong global searching capability characteristic of the wolf colony algorithm, the excessive dependence of original K-means clustering on the initial central points is weakened to a certain extent, in order to carry out more accurate analysis on the complaint risk clustering of the power consumer, the wolf colony algorithm of an improved searching strategy is provided in 3 searching stages of executing the wolf colony algorithm, the accurate clustering center can be obtained, and the clustering accuracy is improved.

Description

User complaint clustering analysis method based on improved wolf pack optimization K-means

Technical Field

The invention belongs to the technical field of data analysis, and particularly relates to a user complaint cluster analysis method based on improved wolf pack optimization K-means.

Background

Mature information storage and data storage technologies are introduced in informatization construction of electric power enterprises in China, and client service systems of companies accumulate massive and detailed business data and have the characteristics of high capacity, diversification and instantaneity. In actual work, a company extracts some statistical tables from customer service data by adopting a traditional data processing mode, such as indexes of manual service rate, satisfaction rate and the like, but business rules hidden in the data are difficult to find, and a mathematical model for describing business characteristics is difficult to abstract. Customer service relies on negotiation, feedback, vocalization and complaints, not being limited to absolute scenarios. The complaints and complaints of the customers to the services directly reflect the dissatisfaction and the most urgent requirements of the customers to the services, the complaint treatment can reflect the possible problems in the business capability and management of enterprise employees, a large amount of business worksheet data are accumulated in a customer service system of an electric power company, the requirements and service expectations of the customers to the businesses are hidden, and the complaint system has guiding significance for the business promotion and management of the enterprises.

The classification of the complaint risk level of the power consumer is a clustering analysis problem in nature and can be realized by using a clustering algorithm. The K-means has the advantages of simple and quick algorithm and capability of effectively processing a large data set, and can realize quick and efficient classification of the complaint risk level of the power consumer. By combining different characteristics of the user information data, the classification of the complaint behaviors of the power users is successfully realized. The traditional K-means algorithm randomly selects an initial clustering center, and the randomness can greatly influence the clustering result.

Disclosure of Invention

Aiming at the problem of grading the complaint risk of the power consumer, the invention provides a K-means clustering method based on an improved wolf colony in order to improve the clustering accuracy and stability of a K-means algorithm and solve the problem of the optimal clustering center of the algorithm.

In order to achieve the purpose, the invention adopts the technical scheme that: a user complaint cluster analysis method based on improved wolf pack optimization K-means comprises the following steps:

step S11: collecting power consumption information data of power consumers;

step S12: complementing the user power utilization information data through a data preprocessing method;

step S13: reducing the dimension of the electricity information data of the power consumer based on the clustering algorithm of the improved wolf pack optimization K-means;

step S14: selecting a power utilization information data characteristic vector of a power consumer;

step S15: and (4) judging the power utilization information data concentration factor molecules of the power consumers.

Further, the clustering algorithm based on the improved wolf pack optimization K-means comprises the following steps:

step S21: initializing wolf cluster, setting artificial wolf position X_iIteration number k, sounding wolf scale factor alpha, wandering number T_maxAnd clustering the number N, calculating the fitness function of the wolf group and selecting the current optimal solution X_bestThe best S artificial wolf except the head wolf is the exploring wolf;

step S22: executing interactive wandering action until the odor concentration Y of the prey detected by a certain wolf detection i_iGreater than the concentration Y of the prey odor sensed by the wolf head_leadOr a maximum number of wandering times T is reached_max；

Step S23: rushing towards prey by wolf according to interactive calling behavior, and sensing smell concentration Y of prey on the way_i＞Y_leadThen Y is_lead＝Y_iReplacing the wolf to initiate a call behavior;

step S24: updating the position of the wolf of terrible, and executing a containment action;

step S25: updating the position of the wolf head according to the wolf head generation rule that the winner is the king, then updating the group according to the wolf group updating mechanism of 'survival of the strong person', and calculating a new cluster center according to the latest position of the improved wolf group;

step S26: ending when the ending condition is reached; otherwise, return to step S23.

Further, in step S11, performing normalization processing on the data by using maximum normalization, and normalizing the value to the [0,1] interval;

the data normalization formula shows:

wherein, X represents the electricity utilization information data of the user; x_minAnd X_maxThe minimum value and the maximum value of certain user information data are respectively accorded with.

Further, in step S12, a simple deletion process is performed on data having a deletion rate greater than 30%; for data with the deletion rate less than or equal to 30%, filling the electricity utilization information data by using an interpolation method, and obtaining a polynomial function L (x) according to the existing data, wherein the Lagrange interpolation polynomial is as follows:

and then, substituting the point corresponding to the missing value into the interpolation polynomial to obtain an approximate value L (x) of the missing value, and completing the data.

Further, in step S13, before performing the cluster analysis, a principal component analysis method is used to perform a dimensionality reduction on the influencing factors influencing the power customers, and a principal component analysis is performed on the power consumption, the voltage level, the complaint times, the age, the gender, the total power consumption, and the illegal power consumption, where:

the variance contribution rate of the ith principal component is:

the cumulative variance contribution of the first i principal components is:

wherein the variance contribution rate alpha of the principal component_iThe larger the value of (a), the stronger the correlation with the sample.

Further, in step S14, the power consumer data set is subjected to feature extraction: the correlation coefficient is used for expressing the attribute correlation degree between the two, and the correlation coefficient R can be obtained_a：

Wherein σ_x、σ_yThe variance of X, Y, E (X), E (Y) tableExpected value, R, of X, Y_aRepresenting the corresponding degree of correlation and the correlation coefficient R of each power consumption information data_aThe larger, the greater the impact on complaint risk; in step S15, the null hypothesis is rejected on the premise that the significance level is 0.05, the initial KMO test value meets and exceeds the critical value of 0.5, and the model data conforms to the factor analysis method.

Further, in step S22, the wolf explores in n directions, the higher n is, the higher the optimizing accuracy is, in order to increase the interactivity between wolfs and improve the optimizing capability, the search method is:

wherein: y is_i,dIndicating the location of the update of the prey,

representing the optimum solution, x, sought within the range_i,dIndicating the position of the original prey, alpha_i,dIs [0,1]]Random number of (2), beta_i,dIs [ -1,1 [ ]]The random number of (1), k ≠ i ≠ j.

Further, in the step S23: selecting better clustering center point Y_iGo forward in the direction of (1), update the wolf location X_iSelecting the wolf at the best clustering center point position as the head wolf.

Further, in step S24, as the iteration number t of the algorithm increases, the adaptive step size that changes linearly is expressed as:

wherein:

represents the position of the wolf of the population of the k +1 generation,

represents the kth generation populationThe position of the head wolf is determined,

the position of the k-th generation group wolf in the d-dimensional space, theta is a factor and is taken as (0,1) internal random number: w is a random integer within { -1,1}

Compared with the prior art, the invention has the outstanding and beneficial technical effects that: the invention provides a K-Means algorithm based on improved wolf pack optimization aiming at the problem of power consumer grade division, and the best clustering center of a data set is obtained through the improved wolf pack algorithm and is used as the initial clustering center of the K-Means, so that the problems that in the traditional K-Means algorithm implementation step, the initial center is easy to cause local optimization of the clustering effect, the algorithm is unstable, and the clustering accuracy is reduced are solved. The invention provides the wolf pack algorithm for improving the search strategy, which can reduce the instability of the initial clustering center selection randomness to the clustering result and ensure the accuracy of clustering, thereby improving the stability of clustering.

Drawings

FIG. 1 is a flow chart of K-menas for the improved wolf pack of the present invention;

FIG. 2 is a diagram of the clustering result of the power consumers according to the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following description, with reference to the drawings in the embodiments of the present invention, clearly and completely describes the technical solution in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to grade complaint risks of power consumption customers, extract power consumption characteristics of different users, namely, identify the power consumption characteristics of different customer groups according to personal information of the users of various types and the power consumption characteristics in different time periods by using collected information data, and perform data dimension reduction processing on a power consumption data set.

The invention provides a user complaint cluster analysis method based on improved wolf pack optimization K-means, which is characterized by comprising the following steps:

step S11: collecting power consumption information data of power consumers; since the classification of the power consumers is mainly to divide typical complaint characteristics of the power consumers and is not to simply divide the complaint characteristics by numerical values, normalization processing is necessary, normalization processing is performed on data by adopting maximum normalization, and the numerical values are normalized to a [0,1] interval;

the data normalization formula shows:

In the embodiment, partial power consumption data of the power consumption acquisition system in the northeast region are analyzed and extracted, and the acquisition time interval is 1 month. Data preparation of power consumer clustering, data is prepared from the following 2 aspects:

user profile information, such as age, electricity usage category, industry category, etc.;

the user electricity consumption information, such as electricity consumption data.

Step S12: the invention discloses an advanced data preprocessing method, which comprises the steps of carrying out missing value analysis and abnormal value analysis on user electricity information data, analyzing the rule and abnormal value of the data, observing the data to find that the dimension of the collected data is too large and the data is not easy to analyze, and extracting user complaint characteristics from the data and extracting some key indexes reflecting the essence of the data so as to achieve the purpose of clustering and improve the accuracy of clustering.

And then, selecting characteristic vectors, wherein the complaint characteristics of each user are influenced by the electricity consumption data differently, the application range of each information data index is not fixed, each information data index has a side focus, and the selection of the information data has great influence on the complaint characteristics of the user.

The user complaint behavior characteristic analysis mainly comprises the following steps:

step S121: and selectively extracting historical user electricity consumption data and user profile information data from the electricity consumption collection data source.

Step S122: and performing data association, data exploration analysis and preprocessing on the two data sets in the step S121, wherein the data association, the data exploration analysis and the preprocessing comprise the exploration analysis of data missing values and abnormal values, the cleaning and the transformation of data and the like.

Step S123: and (4) grading the user complaint characteristics by using the modeling data which is formed in the step (S122) and is subjected to the data preprocessing and a clustering algorithm, performing characteristic analysis on each power consumption customer group, and identifying the user characteristics of each grade.

And then, normalization processing is carried out, the value ranges of the original data can have larger difference, and if the original data are directly processed, the characteristics with large values can possibly annihilate the characteristics with small values, so that the data with small values cannot be effectively analyzed. Meanwhile, since the classification of the power consumer is mainly a classification of typical complaint characteristics of the power consumer, and is not simply a classification with a numerical quantity, a normalization process is necessary. In the experiment, the data is normalized by the maximum value, and the value is normalized to the interval of [0,1 ].

The formula for data normalization is:

wherein: x represents the electricity utilization information data of the user; x_minAnd X_maxThe minimum value and the maximum value of certain user information data are respectively accorded with.

And then, data cleaning is carried out, wherein the data cleaning mainly comprises the steps of deleting invalid data in historical complaint data of the power consumer and user basic data and supplementing the missing data. The data with most of all rows and all columns being empty and the characteristics of high loss rate, numerical value and small standard deviation are removed, when a certain user only has personal information data but no data of power consumption, the data can be used as garbage data, and the data records are stipulated.

Finally, complementing the user electricity utilization information data through a data preprocessing method; for data with deletion rate more than 30%, simple deletion processing is carried out; for data with the missing rate of less than or equal to 30%, because the user information data has certain trend characteristics, the interpolation method can enable the interpolation value to be more accurate, so the interpolation method is adopted for filling the electricity information data, and a polynomial function L (x) is obtained according to the existing data, wherein the polynomial function L (x) is obtained by the following method:

calculating an n-1 degree polynomial for the known n points:

y＝a₀+a₁x+a₂x²+…+a_n-1x^n-1

coordinate (x) of n points₁,y₁),(x₂,y₂)…(x_n,y_n) A human-substituting polynomial function to obtain

…

The lagrange interpolation polynomial available is:

The main characteristics of the power consumption user clustering comprise the types of complaint events of users, the complaint number, the age of the users, the addresses of the users, the annual power consumption and the monthly power consumption, and also comprise information such as whether complaint acceptance contents are reasonable and the return visit satisfaction of the users, but the influence of the factors on the final clustering effect is superposed, the characteristics need to be subjected to dimension reduction processing, and the clustering accuracy is improved.

Step S13: reducing the dimension of the power consumption information data of the power consumer; before clustering analysis is carried out, a principal component analysis method is adopted to carry out dimensionality reduction on influencing factors influencing power customers, principal component analysis is carried out on power consumption, voltage levels, complaint times, ages, sexes, total power consumption and illegal power consumption, principal component analysis keeps main information of original data, clustering accuracy is further improved, a plurality of mutually-dependent data sets are converted into mutually-independent data sets by the aid of the dimensionality reduction method, and the mutually-independent data sets are the principal components of the original mutually-dependent data sets. The principal components are obtained by linearly combining the original data, and are independent from each other, so that the data characteristics of the original data can be kept in the principal components, and the independence of the principal components is also ensured.

M variables are observed for the original data, and the original data matrix of n samples is shown as the formula.

Wherein x is₁,x₂,…,x_mComplaint characteristics, x, representing the ith user_i＝{x_i1,x_i2,…,x_imA sample amount representing the type of complaint event for the user, the complaint quantity … monthly electricity usage.

And in view of different maximum values of data in each user information, carrying out standardization processing on the user electricity utilization information data to obtain a data matrix for carrying out cluster analysis.

Each column x of the sample matrix_jThe average value of (a) is:

the variance of the sample matrix X is:

the formula for data normalization is as follows:

normalizing the elements in the sample matrix to form a normalized matrix, the covariance matrix Y of matrix X, i.e., the correlation coefficient matrix, is given by the formula:

for an orthogonal matrix U, there is U^TYU ═ Λ, where Λ ═ λ (λ)₁,λ₂,…λ_m) And λ₁>λ₂…>λ_mHaving a of₁,α₂,…α_mIs about₁,λ₂,…λ_mThe feature vector of (2). Random variable y₁,y₂,…y_mIs not correlated and the eigenvalue is lambda_iYi is called the ith principal component of the sample matrix x.

Wherein:

the variance contribution rate of the ith principal component is:

the cumulative variance contribution of the first i principal components is:

variance contribution rate α of principal component_iThe larger the value of (d), the stronger the correlation with the sample, and the ordering of the principal component components is arranged in descending order of the size of the feature root. In practical application, the number of extracted principal componentsThe quantity depends on the cumulative variance contribution ratio beta_i。

PCA converts the above variables into synthetic variables as shown in the formula.

F₁＝a₁₁x₁+a₁₂x₂+…+a_1mx_m

F₂＝a₂₁x₁+a₂₂x₂+…+a_2mx_m

…

F_p＝a_p1x₁+a_p2x₂+…+a_mmx_m

The abbreviation is shown as the following formula:

F_j＝a_j1x₁+a_j2x₂+…+a_jmx_m j＝1.2…m

on which the following are satisfied:

1)F₁,F₂,…F_jare independent of each other;

2)F₁,F₂,…F_jthe variance is gradually decreased one by one, and F is defined₁Is a first main component, F₂Is the second principal component, and so on, a_k1 ²+a_k2 ²+a_k3 ²+···+a_kp ²1, k-1, 2, p, wherein a_ijIs the main coefficient.

As shown in fig. 1, the present invention adopts the wolf pack algorithm of the improved search strategy by improving the flow chart of the wolf pack k-means algorithm, which includes: interactive walking behavior, interactive calling behavior and self-adaptive attack behavior.

The invention relates to a clustering algorithm based on improved wolf pack optimization K-means, which comprises the following steps:

step S22: performing interactionsWandering until the odor concentration Y of the prey detected by a specific wolf-exploring i_iGreater than the concentration Y of the prey odor sensed by the wolf head_leadOr a maximum number of wandering times T is reached_max；

step S26: end when the end condition (optimal position or maximum iteration number) is reached; otherwise, return to step S23.

In the interactive walking behavior of the method, the exploring wolf is explored towards n directions, the higher n is, the higher the optimizing precision is, but the optimizing speed of the algorithm is reduced, and the optimal clustering center point is easy to fall into local optimization; n is too small, which causes inaccuracy of the clustering center point and even the condition that the clustering center point cannot be searched. The reason for this is that the wolfs are lack of necessary information interaction, and the information of the "fellow" cannot be known in time, which affects the global search ability of the wolfs. In order to increase the interactivity between the wolfs and improve the optimizing capability, the searching method is as follows:

wherein: y is_i,dIndicating the location of the update of the prey,

representing the optimum solution, x, sought within the range_i,dIndicating the position of the original prey, alpha_i,dIs [0,1]]Random number of (2), beta_i,dIs [ -1,1 [ ]]The random number of (1), k ≠ i ≠ j. Middle and front half section of the formulaThe method has the advantages that the local optimizing capability of the wolf cluster is enhanced, the global searching capability of the wolf cluster is enhanced in the second half, the global searching capability and the local optimizing capability of the wolf cluster are well balanced, the leading capability of the wolf head wolf of the wolf cluster is embodied, the close communication of information among the wolf clusters is kept, and the searched clustering center point is more accurate.

In the interactive calling behavior of the method, the wolf is rushed continuously until d_i＜d_nearThe basic calling behavior can enable the wolf to comprehensively explore a search space, but the algorithm is too complex and is easy to fall into a local optimal clustering center point. Therefore, the invention adopts the calling strategy that the russian wolf can carry out 'containment' on the sought clustering center point once. In the group algorithm, the communication among groups is an important ring of the algorithm, and a better clustering center point Y is selected_iGo forward in the direction of (1), update the wolf location X_iSelecting the wolf at the best clustering center point position as the head wolf.

In the self-adaptive attack behavior of the method, the attack behavior requires that the wolf has strong local optimization capability. The method has randomness and uncertainty, along with the continuous evolution of the algorithm, the current optimal solution is closer to the global optimal solution, the wolf exploitation capability is stronger, the algorithm is enabled to quickly converge the global optimal solution, an adjusting mechanism is added into the algorithm, the method is a better improvement direction, in order to enable the attack behavior to have self-adaptive adjusting capability, the random step length lambda is changed into the self-adaptive step length which is linearly changed along with the increase of the iteration times t of the algorithm, and the formula is as follows:

wherein:

represents the position of the wolf of the population of the k +1 generation,

indicating the position of the wolf of the kth generation,

the position of the k-th generation group wolf in the d-dimensional space, theta is a factor and is taken as (0,1) internal random number: w is a random integer within { -1,1 }. The purpose of taking theta in (0,1) is to ensure that w (1-theta t/t) is avoided at the later stage of algorithm iteration_max) Approaching to zero, resulting in no change in optimization; w is to ensure that the search range is not limited

Can search for the area near xid more comprehensively. If the prey odor concentration perceived by the artificial wolf is larger than the prey odor concentration perceived by the home position state after the attack behavior is implemented, updating the position of the artificial wolf; otherwise, the position of the artificial wolf is not changed.

By adopting the method, the effective classification of the grades of the complaint users can be realized, the method not only reduces the number of variables used in clustering, but also retains important information contained in original variables, simplifies practical application and operation and improves clustering precision. As shown in fig. 2, the present invention can divide power consumers into high-risk consumers, low-risk consumers, general-risk consumers, good consumers, and bad consumers. This classification method helps the utility company classify the customers. On the basis, corresponding risk prevention strategies can be formulated for partial users by combining credit grades and the like, an early warning platform is constructed, electric power companies can be prevented from defaulting electric charges and stealing electricity, and huge economic benefits are brought.

Step S14: selecting a power utilization information data characteristic vector of a power consumer; performing feature extraction on the power user data set: the correlation coefficient is used for expressing the attribute correlation degree between the two, and the correlation coefficient R can be obtained_a：

Wherein σ_x、σ_yThe variance of X, Y, E (X), E (Y) the expected value of X, Y, R_aRepresenting the degree of correlation, R, corresponding to each electricity consumption information data_aRepresenting the corresponding degree of correlation and the correlation coefficient R of each power consumption information data_aThe larger the impact on complaint risk.

Step S15: judging the concentration factor molecules of the power utilization information data of the power consumers; in step S15, a null hypothesis is removed on the premise that the significance level is 0.05, which indicates that the initial data has internal correlation and information redundancy, and is suitable for data processing and statistical analysis using a factor analysis method. Meeting and exceeding the critical value of 0.5 means that the factor analysis is suitable and the model data conforms to the factor analysis method. The total variation of the initial data is preserved. However, the effect of the concentration of factors is also eliminated. The main problem of factor analysis is how to maintain the information interpretation capability of the initial data on the premise of factor concentration. In selecting the scale of elements, it is desirable to balance element concentration and information retention.

The above embodiments are only preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, so: all equivalent changes made according to the structure, shape and principle of the invention are covered by the protection scope of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Claims

1. A user complaint cluster analysis method based on improved wolf pack optimization K-means is characterized by comprising the following steps:

step S11: collecting power consumption information data of power consumers;

2. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, wherein the improved wolf pack optimization K-means-based clustering algorithm comprises the following steps:

3. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, characterized in that: in step S11, performing normalization processing on the data by using maximum normalization, and normalizing the value to a [0,1] interval;

the data normalization formula shows:

4. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, characterized in that: in step S12, a simple deletion process is performed on data having a deletion rate greater than 30%; for data with the deletion rate less than or equal to 30%, filling the electricity utilization information data by using an interpolation method, and obtaining a polynomial function L (x) according to the existing data, wherein the Lagrange interpolation polynomial is as follows:

5. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, characterized in that: in step S13, before performing cluster analysis, a principal component analysis method is used to perform dimensionality reduction on the influence factors affecting the power customers, and principal component analysis is performed on the power consumption, the voltage class, the complaint times, the age, the gender, the total power consumption, and the illegal power consumption, where:

the variance contribution rate of the ith principal component is:

the cumulative variance contribution of the first i principal components is:

6. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 1, wherein in step S14, the power user data set is subjected to feature extraction: the correlation coefficient is used for expressing the attribute correlation degree between the two, and the correlation coefficient R can be obtained_a：

Wherein σ_x、σ_yThe variance of X, Y, E (X), E (Y) the expected value of X, Y, R_aRepresenting the corresponding degree of correlation and the correlation coefficient R of each power consumption information data_aThe larger, the greater the impact on complaint risk; in step S15, the null hypothesis is rejected on the premise that the significance level is 0.05, the initial KMO test value meets and exceeds the critical value of 0.5, and the model data conforms to the factor analysis method.

7. The method for user complaint cluster analysis based on improved wolf pack optimization K-means as claimed in claim 2, wherein in step S22, the wolf explorers explore in n directions, the higher n is, the higher the optimizing accuracy is, in order to increase the interactivity between wolfs explorers and improve the optimizing capability, the searching method is:

wherein: y is_i,dIndicating the location of the update of the prey,

8. The improved wolf pack optimization K-means based user complaint cluster analysis method as claimed in claim 2, wherein in the step S23: selecting better clustering center point Y_iGo forward in the direction of (1), update the wolf location X_iSelecting the wolf at the best clustering center point position as the head wolf.

9. The improved wolf pack optimization K-means-based user complaint cluster analysis method as claimed in claim 2, wherein in step S24, as the iteration number t of the algorithm increases, the adaptive step length of the linear change is expressed as:

wherein:

represents the position of the wolf of the population of the k +1 generation,

indicating the position of the wolf of the kth generation,

the position of the k-th generation group wolf in the d-dimensional space, theta is a factor and is taken as (0,1) internal random number: w is a random integer within { -1,1 }.