CN111339294B

CN111339294B - Customer data classification method and device and electronic equipment

Info

Publication number: CN111339294B
Application number: CN202010086453.6A
Authority: CN
Inventors: 井玉欣; 陈永林; 陈甜甜
Original assignee: Puxin Hengye Technology Development Beijing Co ltd
Current assignee: Puxin Hengye Technology Development Beijing Co ltd
Priority date: 2020-02-11
Filing date: 2020-02-11
Publication date: 2023-07-25
Anticipated expiration: 2040-02-11
Also published as: CN111339294A

Abstract

The invention provides a method and a device for classifying customer data and electronic equipment, wherein the method comprises the following steps: acquiring client data comprising a plurality of client records; wherein the customer data includes a plurality of columns of attributes; respectively determining the attribute type of each column of attribute; wherein the attribute type is a classification type or a numerical value type; converting attribute values corresponding to the classified data attributes in the client data into attribute values corresponding to the numerical data attributes; and executing clustering operation on the client data to obtain a clustering result for representing client subdivision. The invention can convert the classified data attribute into the numerical data attribute and then execute the clustering operation, so that the classified data attribute and the numerical data attribute can be uniformly considered in the clustering operation, and the classification effect is better.

Description

Customer data classification method and device and electronic equipment

Technical Field

The application relates to the technical field of marketing and artificial intelligence, in particular to a client data classification method, a client data classification device and electronic equipment.

Background

Customer subdivision is a necessary thing for enterprises, and not only can customers be better understood, but also cost can be effectively reduced, and great benefits are brought to enterprises. Customer subdivision is a basic project, and particularly for customer relationship management, accurate customer subdivision greatly improves marketing efficiency; the subdivision method has a certain help effect on general marketing tactics, production operations and even enterprise strategic applications.

At present, the customer subdivision mainly starts from distinguishing different demands of customers and different attribute characteristics of the customers, divides the whole market into a plurality of sub-markets requiring different products and different marketing combinations according to certain standards, selects certain target markets on the basis, and finally designs the whole activity process of corresponding marketing tools.

The object of the client subdivision is client data, which typically includes basic data, behavior data, etc., and may include both categorical data attributes (discrete type) and numerical data attributes (continuous type) as viewed in data type. For the client data, the attribute of the classified data comprises the attribute of sex, occupation, residence and the like of the client, and the value range of the attribute is discrete and limited; the numeric data attributes include income, login time length, consumption amount and the like of the client, and the numeric data attributes have a numeric range of continuous numeric intervals.

At present, in the process of classifying the client data, the data types (classified data attributes and numerical data attributes) of the client data are not distinguished, but the clustering algorithm is biased to the classified data attributes due to the obvious characteristics of the classified data attributes, so that the final classification result is focused on the classified data attributes too much, the numerical data attributes are ignored, and the classification effect is not good.

Disclosure of Invention

In view of this, the present application provides a method, an apparatus, and an electronic device for classifying customer data, which can convert a classified data attribute into a numeric data attribute, and then perform a clustering operation, so that the classified data attribute and the numeric data attribute can be considered in a balanced manner in the clustering operation, so that the classification effect is better.

In order to achieve the above object, the present invention provides the following technical features:

a method of classifying customer data, comprising:

acquiring client data comprising a plurality of client records; wherein the customer data includes a plurality of columns of attributes;

respectively determining the attribute type of each column of attribute; wherein the attribute type is a classification type or a numerical value type;

converting attribute values corresponding to the classified data attributes in the client data into attribute values corresponding to the numerical data attributes;

and executing clustering operation on the client data to obtain a clustering result for representing client subdivision.

Optionally, the determining the attribute type of each column of attributes includes:

judging whether the numerical type of the attribute value corresponding to a list of attributes is a continuous data type or not;

determining the column attribute as a classified data attribute if the data type is discontinuous;

if the data type is continuous, counting the number of different attribute values in the list of attributes, and calculating the ratio of the number of the different attribute values to the total number of the attribute values;

judging whether the ratio is larger than a set threshold value or not;

if the ratio is greater than a set threshold, determining that the column attribute is a numerical data attribute;

and if the ratio is not greater than the set threshold, determining that the column attribute is a classified data attribute.

Optionally, the converting the attribute value corresponding to the classified data attribute in the client data into the attribute value corresponding to the numeric data attribute includes:

the following operations are executed for each column of classified data attributes in the client data:

grouping the client data according to different attribute values of the column of classified data attributes to obtain a plurality of groups corresponding to the attribute values one by one;

determining a target numerical data attribute which is most matched with the classified data attribute from the numerical data attributes of the client data;

for each packet: and calculating the average attribute value of the target numerical data attribute in the group, determining the average attribute value as the attribute value corresponding to the group, and converting the attribute value into the attribute value of the numerical data attribute.

Optionally, the determining, from the respective numeric data attributes of the client data, the target numeric data attribute that matches the classified data attribute best includes:

calculating the intra-group variance of each group aiming at each numerical data attribute of the client data, and summing the intra-group variances to obtain a variance sum corresponding to each numerical data attribute;

performing sorting operation on the variance sum corresponding to each numerical data attribute;

and determining the variance and the smallest numerical data attribute as the target numerical data attribute which is matched with the classification data attribute best.

Optionally, before the clustering operation is performed on the client data, removing outliers in the client data by adopting an isolated forest algorithm.

Optionally, the performing a clustering operation on the client data, and obtaining a clustering result for representing client subdivision includes:

performing pre-clustering on the client data by adopting a hierarchical classification algorithm, and stopping pre-clustering when the number of the output micro class clusters reaches a preset number;

calculating the center points of a preset number of micro clusters;

determining K center points from a preset number of center points to serve as initial center points of a K-means algorithm;

and performing a second clustering operation based on the K initial center points to obtain a clustering result for representing client subdivision.

Optionally, the determining K center points from the preset number of center points as initial center points of the K-means algorithm includes:

randomly selecting a point from a preset number of center points as a first initial center point, and adding a set S;

calculating the nearest distance between the rest center points in the preset number of center points and the set S, and selecting one rest center point with the largest nearest distance to add into the set S;

the above steps are repeated until the set S reaches K center points.

Optionally, after the obtaining the client data including the plurality of client records, the method further includes:

performing a data cleansing operation on the customer data set, the data cleansing operation including a missing value padding operation, an abnormal value processing operation, and a repeated data culling operation;

after the attribute type of each column of the attribute is determined, the method further comprises:

performing a decorrelation operation on the plurality of classified data attributes, and deleting the classified data attributes with high correlation;

performing a decorrelation operation on the plurality of numeric data attributes, deleting the numeric data attributes having high correlation, and performing a normalization processing operation on attribute values corresponding to the remaining numeric data attributes.

A customer data classification device comprising:

an acquisition unit configured to acquire client data including a plurality of client records; wherein the customer data includes a plurality of columns of attributes;

a determining unit, configured to determine attribute types of each column of attributes respectively; wherein the attribute type is a classification type or a numerical value type;

the conversion unit is used for converting the attribute value corresponding to the classified data attribute in the client data into the attribute value corresponding to the numerical data attribute;

and the clustering unit is used for executing clustering operation on the client data to obtain a clustering result for representing client subdivision.

An electronic device, comprising:

a memory for storing customer data comprising a plurality of customer records, wherein the customer data comprises a plurality of columns of attributes;

a processor for determining attribute types of each column of attributes respectively; wherein the attribute type is a classification type or a numerical value type; converting attribute values corresponding to the classified data attributes in the client data into attribute values corresponding to the numerical data attributes; and executing clustering operation on the client data to obtain a clustering result for representing client subdivision.

Through the technical means, the following beneficial effects can be realized:

the invention determines the attribute type of the attribute in the client data after the client data is obtained, and converts the attribute value corresponding to the classified data attribute in the client data into the attribute value corresponding to the numerical data attribute; so that the client data all become digital data attributes. And clustering operation is performed on the client data on the basis of the clustering result to obtain a clustering result for representing client subdivision.

The invention converts the classified data attribute into the numerical data attribute and then executes the clustering operation, so that the classified data attribute and the numerical data attribute can be uniformly considered in the clustering operation, and the classification effect is better.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for classifying customer data according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a client data classification device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The method is applied to the electronic equipment, the electronic equipment can be an enterprise server, a cloud server or other equipment for classifying the client data, and the application architecture of the client data classification method is not limited.

The invention provides a client data classification method which is applied to electronic equipment. Referring to fig. 1, the method comprises the following steps:

step S101: acquiring client data comprising a plurality of client records; wherein the customer data includes a plurality of columns of attributes.

The electronic device obtains the client data from an external database, and also can obtain the client data from a self memory, depending on the storage positions of the client data in different application scenes.

Referring to Table 1, a schematic example of customer data for an enterprise is shown. Customer data from the table includes 10 customer records, the customer data including 4 columns of attributes: residence, industry, login duration, and investment amount.

TABLE 1

Step S102: respectively determining the attribute type of each column of attribute; wherein the attribute type is a score type or a numeric type.

Since the attribute type of some attributes among the plurality of attributes is a classification type, the attribute type of some attributes is a numerical type. To facilitate automated processing, the present invention provides a way in which the attribute type of an attribute may be automatically determined.

Step S1: and judging whether the numerical type of the attribute value corresponding to the list of attributes is a continuous data type or not. If not, the step S2 is entered, and if yes, the step S3 is entered.

The continuous data type may be integer or floating point type, and if the value type of the attribute value corresponding to an attribute is not integer or floating point type, the attribute may be determined to be a classified data attribute.

Step S2: if the data type is discontinuous, the column attribute is determined to be a classified data attribute.

Step S3: if the data is continuous, counting the number of different attribute values in the list of attributes, and calculating the ratio of the number of different attribute values to the total number of attribute values.

For example, taking the investment amount in table 1 as an example, the number of different attribute values is 9, and the total number of attribute values is 10, the ratio of the number of different attribute values to the total number of attribute values is 0.9.

Taking the residence in table 1 as an example, the number of different attribute values is 2, and the total number of attribute values is 10, and the ratio of the number of different attribute values to the total number of attribute values is 0.2.

Step S4: judging whether the ratio is larger than a set threshold value or not; if yes, go to step S5, otherwise go to step S6.

Step S5: if the ratio is greater than a set threshold, determining that the column attribute is a numerical data attribute;

step S6: and if the ratio is not greater than the set threshold, determining that the column attribute is a classified data attribute.

The threshold is typically set empirically, it being understood that if the attribute is of a type, the attribute value is typically a finite number, and the proportion is small relative to the total number of attribute values; if the attribute is of a numerical type, the value of each attribute value may be different, and the result of dividing the number of attribute values by the total number of attribute values is close to 1.

Step S103: and converting the attribute value corresponding to the classified data attribute in the client data into the attribute value corresponding to the numerical data attribute.

Optionally, in order to facilitate subsequent processing, the standardized operation can be performed on the attribute values of the digital data attributes, so that the magnitude is unified, and the reliability of the algorithm effect is ensured. Alternatively, a normalization algorithm (Max-min method or Z-score method, etc.) may be employed to perform the normalization operation.

Referring to table 2, a schematic example of customer data after the normalization operation is performed on the basis of table 1.

TABLE 2

step S1: and grouping the client data according to different attribute values of the column of the classified data attributes to obtain a plurality of groups corresponding to the attribute values one by one.

For example, customer data is divided into two groups by residence, beijing group and Shanghai group.

Step S2: from among the individual numeric data attributes of the customer data, a target numeric data attribute that best matches the categorized data attribute is determined.

Since there are a plurality of numerical data attributes, the classification data attributes are converted according to one numerical data attribute. In order to make the conversion result more optimal, a target numeric data attribute that best matches the classified data attribute may be selected among the individual numeric data attributes.

The method specifically comprises the following steps:

s21: and calculating the intra-group variance of each group aiming at each numerical data attribute of the client data, and summing the intra-group variances to obtain a variance sum corresponding to each numerical data attribute.

S22: performing sorting operation on the variance sum corresponding to each numerical data attribute;

s23: and determining the variance and the smallest numerical data attribute as the target numerical data attribute which is matched with the classification data attribute best.

The variance and the smallest numerical data attribute represent that the classification type data attribute is associated with, so the fluctuation range is small. Then, the attribute value obtained after conversion using the target numeric data attribute with which the classified data attribute is the most matched is relatively stable.

Step S3: for each packet: and calculating the average attribute value of the target numerical data attribute in the group, determining the average attribute value as the attribute value corresponding to the group, and converting the attribute value into the attribute value of the numerical data attribute.

For ease of understanding, the process of converting a split type data attribute into a numeric data attribute is explained in detail below on the basis of table 2:

first, the following definitions are given:

assuming that k customer records exist in one customer data D, the customer records consist of p columns of classified data attributes and q columns of standardized numerical data attributes; wherein:

d: representing the total number of customer records, i.e. k, 10 in the example of table 2.

D ⁱ : representing the ith customer record and 1.ltoreq.i.ltoreq.k. Examples: d (D) ³ Representing customer record 3 in table 2.

X _i : representing the data attribute of the ith column classification type, wherein i is more than or equal to 1 and less than or equal to p; examples: x is X ₂ Is an "industry" attribute.

IX _i : representing categorical data attributes X _i A set of attribute values that occur in the model; examples: IX (IX) ₂ = { "government", "IT", "finance" }.

N _i : the i-th column numerical data attribute is represented, and i is more than or equal to 1 and less than or equal to q. Example N ₁ Is a "login duration" attribute.

The attribute value of the i-th column classification type data attribute of the client record D is represented, and D is E D, i is more than or equal to 1 and less than or equal to p.

Examples: d=d ³ ，

Representing the value of the ith column value type data attribute of the client record D, wherein D is E D, i is more than or equal to 1 and less than or equal to q; examples: d=d ³ ,

D(X _i =t): representing column i taxonomy data attributes X _i Customer records with value t, and t epsilon IX _i ,

Examples: t= "Shanghai", D (X ₁ ＝t)＝4。

|D(X _i =t) |: representing all attributes X _i The number of customer records with a value of t;

examples: t= "Shanghai", |d (X ₁ ＝t)|＝4。

V(t|X _i ): representing categorical data attributes X _i An attribute value t obtained after conversion.

SS(X _i ，N _j ): the sum of the intra-group variances representing the i-th column classification type data attribute and the j-th column numerical type data attribute is 1.ltoreq.i.ltoreq.p, 1.ltoreq.j.ltoreq.q.

The following describes the specific conversion process:

a) Traversing all of the taxonomy data attributes in the customer data D, for each column of the taxonomy data attributes X _i (e.g. take X ₁ "residence" attribute) performs the following operations:

for all client records in the client data D, according to the classification attribute X _i Is denoted as D (X _i ＝t ₁ )、D(X _i ＝t ₂ )……D(X _i ＝t _m ) Wherein t is _m ∈IX _i 。

Examples: the above can be divided into 2 groups in table 2:

D(X ₁ beijing =

D(X ₁ =Shanghai)

Traversing individual numeric data attributes in a customer record, for each numeric data attribute N _j According to X _i Each attribute value is grouped and then is in attribute N _j Sum of intra-group variances SS (X) _i ，N _j )。

For example, with numerical data attribute N ₁ For example, calculate the attributes N in each group ₁ Corresponding average attribute values.

D*＝D(X _i ＝t _m ),t _m ∈IX _i

Examples: d (X) ₁ Log duration attribute average value of each piece of data in =beijing) is:

E(N ₁ ,D(X ₁ =beijing))= (0.23+0.77+0.19+0.23+0.9+1)/6= 0.55333

D(X ₁ Log duration attribute average value of each piece of data in =Shanghai) is:

E(N ₁ ,D(X ₁ =Shanghai)) (0+0.09+0.52+0.27)/4=0.22

Then, the intra-group variance of each group, which is the variance calculated from the attribute values and the average attribute values in each group, reflecting the degree of difference of each attribute value in the group, is calculated as follows:

D*＝D(X _i ＝t _m ),t _m ∈IX _i

then, the sum of intra-group variances SS (X) _i ，N _j )＝0.11796+0.03945＝0.15741。

According to the algorithm, the intra-group variance sum corresponding to each numerical data attribute in the grouping division in the step i is calculated in sequence, and is respectively as follows: SS (X) _i ，N ₁ )、SS(X _i ，N ₂ )……SS(X _i ，N _q )。

From the sum of the variances in each group, the smallest value is selected as SS (X _i ，N _min ) The corresponding numerical attribute is N _min Then N _min Is X _i The best reference attribute, i.e., the target numeric data attribute, is converted.

Using the numerical attribute N _min For classification attribute X _i And converting, namely converting each attribute value item into a numerical value type. That is, given an attribute value t _m ∈IX _i According to step i, the corresponding packet data D (X _i ＝t _m ) Wherein the attribute N _min Corresponding value set of (a) isWe use the average attribute value of this set, i.e., E (N _min ,D(X _i ＝t _m ) As attribute value t) _m The attribute value of the corresponding numerical type data attribute realizes the conversion from the classified type to the numerical type. Namely:

D*＝D(X _i ＝t _m ),t _m ∈IX _i

the data attribute X of the classification type can be obtained according to the formula _i Is converted into a numeric data type. Thus, the 'Beijing' and the 'Shanghai' in the residential area are converted into continuous values. Examples: v (Beijing|X) ₁ ,N ₁ )＝E(N ₁ ,D(X ₁ =beijing))= 0.553.V (Shanghai|X) ₁ ,N ₁ )＝E(N ₁ ,D(X ₁ =Shanghai))=0.22.

Repeating the process of a), finding the best conversion reference numerical data attribute for each classified data attribute, converting the attribute value of the classified data attribute based on the best reference numerical data attribute, and finally converting all the classified data attributes into the attribute value of the numerical data attribute.

Step S104: and executing clustering operation on the client data to obtain a clustering result for representing client subdivision.

A hierarchical clustering algorithm, or a K-means algorithm, or other clustering operations may be employed to cluster the customer data to obtain a clustering result representing customer segments.

Alternatively, since different clustering algorithms have advantages and disadvantages, the present invention adopts a secondary clustering algorithm. In order to integrate the calculated amount and the clustering effect, hierarchical clustering and K-means are adopted to perform secondary clustering operation.

Alternatively, the present invention provides a preferred solution for the clustering operation. In the prior art, a single clustering operation cannot be well performed, and therefore, the method adopts a secondary clustering mode to perform clustering operation on the client data.

Step S1: and performing pre-clustering on the client data by adopting a hierarchical classification algorithm, and stopping pre-clustering when the number of the output micro class clusters reaches a preset number.

Step S2: and calculating the central points of the preset number of micro clusters.

the above steps are repeated until the set S reaches K center points.

Step S3: determining K center points from a preset number of center points to serve as initial center points of a K-means algorithm;

step S4: and performing a second clustering operation based on the K initial center points to obtain a clustering result for representing client subdivision.

The method and the device have the advantages that the merging rule in the hierarchical clustering algorithm is easy to define, and parameters do not need to be set, so that the hierarchical clustering algorithm is adopted for pre-clustering in the initial stage, the subsequent clustering data quantity is reduced, and the data rule is found preliminarily.

Because hierarchical clustering is complex in calculation and chain-shaped clusters are easy to form, a stop-use condition is set when the number of micro clusters reaches a preset number. Later, K-means is used for subsequent clustering.

K-means clustering is efficient and rapid, but the effect is influenced by the initial center selection position, and here, the initial points are selected in the hierarchical clustering result to avoid the points from being selected too far or too densely, so that the clustering effect is prevented from being in local optimization, and the clustering effect is more stable and effective.

Optionally, the number of clusters in the output result of the clustering operation is recorded as K, and the K value may be determined by the following method:

the user designates the number of the clusters to be divided into a plurality of clusters, the effect is observed and evaluated manually after the clustering is carried out, if the number of the clusters is unsatisfactorily adjustable and the clustering is repeated, and a reasonable result can be obtained by repeatedly carrying out the process;

and automatically evaluating the number K of the optimal clusters through an iterative clustering process, sequentially taking the values of the K in a certain value space, clustering the K respectively to obtain output results, calculating corresponding contour coefficients (Silhouette Coefficient), and finally selecting the K value with the largest contour coefficient as the optimal value.

Optionally, after obtaining the client data including the plurality of client records in step S101, the method further includes:

and performing data cleaning operations on the client data set, wherein the data cleaning operations comprise a missing value filling operation, an abnormal value processing operation and a repeated data rejecting operation.

performing a decorrelation operation on the plurality of classified data attributes, and deleting the classified data attributes with high correlation; and performing a decorrelation operation on the plurality of numerical data attributes, and deleting the numerical data attributes with high correlation. This reduces the repetition properties in order to reduce the amount of subsequent calculations.

Through the technical means, the following beneficial effects can be realized:

Referring to fig. 2, the present invention provides a customer data classification apparatus, comprising:

an acquisition unit 21 for acquiring client data including a plurality of client records; wherein the customer data includes a plurality of columns of attributes;

a determining unit 22 for determining attribute types of each column of attributes, respectively; wherein the attribute type is a classification type or a numerical value type;

a conversion unit 23, configured to convert an attribute value corresponding to a classified data attribute in the client data into an attribute value corresponding to a numeric data attribute;

and the clustering unit 24 is used for performing clustering operation on the client data to obtain a clustering result for representing client subdivision.

For the specific implementation of the client data classifying device, reference may be made to the specific implementation of the client data classifying method, which is not described herein.

Referring to fig. 3, the present invention provides an electronic device including:

a memory 31 for storing customer data comprising a plurality of customer records, wherein the customer data comprises a plurality of columns of attributes;

a processor 32 for determining the attribute type of each column of attributes, respectively; wherein the attribute type is a classification type or a numerical value type; converting attribute values corresponding to the classified data attributes in the client data into attribute values corresponding to the numerical data attributes; and executing clustering operation on the client data to obtain a clustering result for representing client subdivision.

For the specific implementation of the processor, reference may be made to the specific implementation of the client data classification method, which is not described herein.

Through the technical means, the following beneficial effects can be realized:

The functions described in the method of this embodiment, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computing device readable storage medium. Based on such understanding, a portion of the embodiments of the present application that contributes to the prior art or a portion of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of classifying customer data, comprising:

the determining the attribute type of each column of attributes respectively comprises the following steps:

judging whether the ratio is larger than a set threshold value or not;

if the ratio is not greater than the set threshold, determining that the column attribute is a classified data attribute;

2. The method of claim 1, wherein converting the attribute value corresponding to the categorized data attribute in the customer data into the attribute value corresponding to the numeric data attribute comprises:

3. The method of claim 2, wherein said determining a target numeric data attribute that best matches the classified data attribute from among the numeric data attributes of the customer data comprises:

4. The method of claim 1, further comprising removing outliers in the customer data using an orphan forest algorithm prior to performing a clustering operation on the customer data.

5. The method of claim 4, wherein performing a clustering operation on the client data to obtain a clustering result representing client segments comprises:

calculating the center points of a preset number of micro clusters;

6. The method of claim 5, wherein determining K center points from among a preset number of center points as initial center points of a K-means algorithm comprises:

the above steps are repeated until the set S reaches K center points.

7. The method of claim 1, wherein,

after the obtaining the client data containing the plurality of client records, the method further comprises:

8. A customer data sorting apparatus, comprising:

judging whether the numerical type of the attribute value corresponding to a list of attributes is a continuous data type or not; determining the column attribute as a classified data attribute if the data type is discontinuous; if the data type is continuous, counting the number of different attribute values in the list of attributes, and calculating the ratio of the number of the different attribute values to the total number of the attribute values; judging whether the ratio is larger than a set threshold value or not; if the ratio is greater than a set threshold, determining that the column attribute is a numerical data attribute; if the ratio is not greater than the set threshold, determining that the column attribute is a classified data attribute;

9. An electronic device, comprising:

a processor for determining attribute types of each column of attributes respectively; the determining the attribute type of each column of attributes respectively comprises the following steps: judging whether the numerical type of the attribute value corresponding to a list of attributes is a continuous data type or not; determining the column attribute as a classified data attribute if the data type is discontinuous; if the data type is continuous, counting the number of different attribute values in the list of attributes, and calculating the ratio of the number of the different attribute values to the total number of the attribute values; judging whether the ratio is larger than a set threshold value or not; if the ratio is greater than a set threshold, determining that the column attribute is a numerical data attribute; if the ratio is not greater than the set threshold, determining that the column attribute is a classified data attribute;

wherein the attribute type is a classification type or a numerical value type; converting attribute values corresponding to the classified data attributes in the client data into attribute values corresponding to the numerical data attributes; and executing clustering operation on the client data to obtain a clustering result for representing client subdivision.