CN113837319A

CN113837319A - Clustering-based customer classification method, device, equipment and storage medium

Info

Publication number: CN113837319A
Application number: CN202111234801.0A
Authority: CN
Inventors: 蒋国青
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-10-22
Filing date: 2021-10-22
Publication date: 2021-12-24
Anticipated expiration: 2041-10-22
Also published as: CN113837319B

Abstract

The application relates to the technical field of artificial intelligence, and discloses a customer classification method, a device, equipment and a storage medium based on clustering, wherein the method comprises the following steps: dividing each customer data by adopting a preset clustering algorithm and a preset clustering number to obtain a plurality of first clustering sets; respectively calculating the high quantile score and the low quantile score of each evaluation dimension for the first cluster set by adopting an evaluation dimension and high and low quantile ratio mapping table; dividing the customer data in the first clustering set according to the scores of all high quantile points and the scores of all low quantile points corresponding to the first clustering set to obtain a second clustering set and an unclassified customer data set; carrying out centroid calculation on the second cluster set to obtain a target centroid; and dividing the customer data in each un-classified customer data set into each second classification set according to each target centroid to obtain a customer classification result. And adjusting the mass center again according to each class obtained by the preset clustering algorithm, and accurately identifying the forward change of the whole data.

Description

Clustering-based customer classification method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for classifying clients based on clustering.

Background

Unsupervised clustering results obtained by adopting a kmeans (K-means clustering algorithm) clustering algorithm are used as calibration values of the samples, then the samples are adopted to train a supervised clustering model, and the trained supervised clustering model is used for carrying out customer classification. Because the preset clustering algorithm is an unsupervised algorithm, the calculation principles of the unsupervised algorithm and the algorithm of the supervised clustering model may be different, so that the trained supervised clustering model cannot predict the output result of the unsupervised algorithm by 100%, and the accuracy of the prediction of the trained supervised clustering model is reduced. And the label range of each category of the trained supervised clustering model is fixed and cannot be dynamically adjusted along with the change of data to be predicted, so that the prediction accuracy of the trained supervised clustering model is further reduced.

Disclosure of Invention

The method, the device, the equipment and the storage medium aim at solving the technical problems that in the prior art, an unsupervised clustering result obtained by adopting a preset clustering algorithm is input into a supervised clustering model as a calibration value of a sample for training, the calculation principles of the unsupervised algorithm and the algorithm of the supervised model are possibly different, so that the supervised clustering model after training cannot predict the output result of the unsupervised algorithm by 100 percent, the prediction accuracy of the supervised clustering model after training is reduced, the label range of each category of the supervised clustering model after training is fixed and cannot be dynamically adjusted along with the change of data to be predicted, and the prediction accuracy of the supervised clustering model after training is further reduced.

In order to achieve the above object, the present application provides a method for classifying customers based on clustering, the method comprising:

acquiring a plurality of customer data;

clustering and dividing each customer data by adopting a preset clustering algorithm and a preset clustering number to obtain a plurality of first clustering sets;

respectively carrying out high quantile point score calculation and low quantile point score calculation on each evaluation dimension on the first cluster set by adopting the obtained evaluation dimension and high and low quantile point ratio mapping table;

dividing the customer data in the first clustering set according to the scores of the high quantile points and the scores of the low quantile points corresponding to the first clustering set to obtain a second clustering set and an unclassified customer data set;

carrying out centroid calculation on the second cluster set to obtain a target centroid;

and according to the target centroids, dividing the customer data in the unclassified customer data sets into the second clustering sets to obtain customer classification results.

Further, the step of performing score calculation of high score and score calculation of low score for each evaluation dimension on the first cluster set by using the obtained ratio mapping table of evaluation dimensions and high and low score includes:

randomly acquiring one first cluster set as a cluster set to be analyzed;

arbitrarily acquiring one evaluation dimension as an evaluation dimension to be analyzed;

according to the evaluation dimension to be analyzed, performing positive sequence sorting on the cluster set to be analyzed to obtain a sorted cluster set;

acquiring a target high quantile point proportion and a target low quantile point proportion corresponding to the evaluation dimension to be analyzed from the evaluation dimension and high and low quantile point proportion mapping table;

according to the target high-score point proportion and the cluster set to be analyzed, performing high-score point scoring calculation on the evaluation dimension to be analyzed;

according to the target low-score site proportion and the cluster set to be analyzed, carrying out low-score site scoring calculation on the evaluation dimension to be analyzed;

repeatedly executing the step of randomly acquiring one evaluation dimension as the evaluation dimension to be analyzed until the evaluation dimension is acquired;

and repeatedly executing the step of randomly acquiring one first cluster set as a cluster set to be analyzed until the acquisition of the first cluster set is completed.

Further, the step of dividing the customer data in the first cluster set according to the scores of the high quantile and the scores of the low quantile corresponding to the first cluster set to obtain a second cluster set and an uncategorized customer data set includes:

randomly acquiring one first cluster set as a cluster set to be processed;

randomly acquiring one piece of customer data from the cluster set to be processed as customer data to be analyzed;

according to the scores of the high quantile points and the scores of the low quantile points corresponding to the cluster set to be processed, dividing the customer data to be analyzed into the second cluster set or the unclassified customer data set corresponding to the cluster set to be processed;

repeatedly executing the step of randomly acquiring one piece of customer data from the cluster set to be processed as customer data to be analyzed until the customer data acquisition in the cluster set to be processed is completed;

and repeatedly executing the step of randomly acquiring one first cluster set as the cluster set to be processed until the acquisition of the first cluster set is completed.

Further, the step of dividing the customer data to be analyzed into the second cluster set or the uncategorized customer data set corresponding to the cluster set to be processed according to the high quantile score and the low quantile score corresponding to the cluster set to be processed includes:

randomly acquiring one evaluation dimension as an evaluation dimension to be processed;

obtaining a score corresponding to the evaluation dimension to be processed from the customer data to be analyzed as a target score;

taking the score of the cluster set to be processed at the low score point corresponding to the evaluation dimension to be processed as a starting point of a score range, and taking the score of the cluster set to be processed at the high score point corresponding to the evaluation dimension to be processed as an end point of the score range;

when the target score is located in the score range, repeatedly executing the step of randomly acquiring one evaluation dimension as the evaluation dimension to be processed until the acquisition of the evaluation dimension is completed or the target score is not located in the score range;

when the target score is not within the score range, dividing the customer data to be analyzed into the unclassified customer data set corresponding to the cluster set to be processed;

and when the target score is within the score range, dividing the customer data to be analyzed into the second cluster set corresponding to the cluster set to be processed.

Further, the step of dividing the customer data in each of the uncategorized customer data sets into each of the second cluster sets according to each of the target centroids to obtain a customer classification result includes:

randomly acquiring one piece of customer data from each uncategorized customer data set as customer data to be categorized;

randomly acquiring one target centroid as a centroid to be calculated;

respectively carrying out distance calculation of each evaluation dimension on the customer data to be classified and the centroid to be calculated to obtain a plurality of first distances;

carrying out weighted summation on the first distances to obtain a second distance corresponding to the centroid to be calculated;

repeatedly executing the step of randomly acquiring one target centroid as the centroid to be calculated until the acquisition of the target centroid is completed;

classifying the customer data to be classified into one of the second clustering sets according to the second distances;

repeatedly executing the step of randomly acquiring one piece of client data from each unblassified client data set as client data to be classified until the client data in each unblassified client data set is acquired;

and taking each second clustering set as the client classification result.

Further, the step of classifying the customer data to be classified into one of the second clustering sets according to the second distances includes:

finding out the second distance with the minimum value from the second distances as a target distance;

and classifying the customer data to be classified into the second clustering set corresponding to the target centroid corresponding to the target distance.

Further, the step of performing centroid calculation on the second cluster set to obtain a target centroid includes:

randomly acquiring one second cluster set as a cluster set to be evaluated;

randomly acquiring one evaluation dimension as an evaluation dimension to be evaluated;

performing weighted calculation on the evaluation dimension to be evaluated according to the cluster set to be evaluated to obtain a single-dimension weighted value;

repeatedly executing the step of randomly acquiring one evaluation dimension as an evaluation dimension to be evaluated until the acquisition of the evaluation dimension is completed;

taking each single-dimensional weighted value as the target centroid corresponding to the cluster set to be evaluated;

and repeatedly executing the step of randomly acquiring one second cluster set as the cluster set to be evaluated until the acquisition of the second cluster set is completed.

The present application further provides a customer classification device based on clustering, the device comprising:

the client data acquisition module is used for acquiring a plurality of client data;

the first clustering module is used for clustering and dividing the client data by adopting a preset clustering algorithm and a preset clustering number to obtain a plurality of first clustering sets;

the quantile point score calculation module is used for respectively performing high quantile point score calculation and low quantile point score calculation of each evaluation dimension on the first cluster set by adopting the obtained evaluation dimension and high and low quantile point ratio mapping table;

the second clustering module is used for dividing the customer data in the first clustering set according to the scores of the high quantile points and the scores of the low quantile points corresponding to the first clustering set to obtain a second clustering set and an unclassified customer data set;

the target centroid calculation module is used for carrying out centroid calculation on the second clustering set to obtain a target centroid;

and the customer classification result determining module is used for dividing the customer data in each un-classified customer data set into each second clustering set according to each target centroid to obtain a customer classification result.

The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.

The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

The method comprises the steps of firstly carrying out cluster division on each customer data by adopting a preset clustering algorithm and a preset clustering quantity to obtain a plurality of first cluster sets, secondly carrying out high-score-point score calculation and low-score-point score calculation on each evaluation dimension on the first cluster sets by adopting an obtained evaluation dimension and high-low-score-point ratio mapping table, dividing the customer data in the first cluster sets according to each high-score point score and each low-score point score corresponding to the first cluster sets to obtain a second cluster set and an unclassified customer data set, then carrying out centroid calculation on the second cluster set to obtain a target centroid, and finally dividing the customer data in each unclassified customer data set into each second cluster set according to each target centroid, and obtaining a client classification result. Therefore, the centroid is adjusted again for each class obtained by the preset clustering algorithm, the forward change of the whole data can be accurately identified, an unsupervised clustering result obtained by the preset clustering algorithm does not need to be adopted as a calibration value of a sample and input into a supervised clustering model for training so as to classify customers, and the accuracy of customer classification is improved.

Drawings

FIG. 1 is a schematic flow chart illustrating a method for classifying customers based on clustering according to an embodiment of the present application;

FIG. 2 is a block diagram illustrating an exemplary configuration of a clustering-based customer classification apparatus according to an embodiment of the present disclosure;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides a method for classifying customers based on clustering, where the method includes:

s1: acquiring a plurality of customer data;

s2: clustering and dividing each customer data by adopting a preset clustering algorithm and a preset clustering number to obtain a plurality of first clustering sets;

s3: respectively carrying out high quantile point score calculation and low quantile point score calculation on each evaluation dimension on the first cluster set by adopting the obtained evaluation dimension and high and low quantile point ratio mapping table;

s4: dividing the customer data in the first clustering set according to the scores of the high quantile points and the scores of the low quantile points corresponding to the first clustering set to obtain a second clustering set and an unclassified customer data set;

s5: carrying out centroid calculation on the second cluster set to obtain a target centroid;

s6: and according to the target centroids, dividing the customer data in the unclassified customer data sets into the second clustering sets to obtain customer classification results.

In this embodiment, firstly, a preset clustering algorithm and a preset clustering quantity are adopted to perform clustering division on each client data to obtain a plurality of first cluster sets, secondly, an obtained ratio mapping table of evaluation dimensions and high-low quantile points is adopted to perform high-quantile point score calculation and low-quantile point score calculation on each evaluation dimension on each first cluster set, the client data in the first cluster sets are divided according to the high-quantile point score and the low-quantile point score corresponding to the first cluster sets to obtain a second cluster set and an unclassified client data set, secondly, centroid calculation is performed on the second cluster set to obtain a target centroid, and finally, the client data in each unclassified client data set is divided into each second cluster set according to each target centroid, and obtaining a client classification result. Therefore, the centroid is adjusted again for each class obtained by the preset clustering algorithm, the forward change of the whole data can be accurately identified, an unsupervised clustering result obtained by the preset clustering algorithm does not need to be adopted as a calibration value of a sample and input into a supervised clustering model for training so as to classify customers, and the accuracy of customer classification is improved.

For S1, a plurality of customer data input by the user may be obtained, a plurality of customer data may be obtained from a database, or a plurality of customer data may be obtained from a third-party application system.

The customer data includes: the customer identifies and evaluates the dimensional dataset. The client identification may be a client name, a client ID, etc. that uniquely identifies a client. Evaluating the dimensional dataset includes: evaluation dimensions and scores, wherein each evaluation dimension corresponds to a score. The evaluation dimension is the perspective from which the customer is evaluated. For example, when the application is applied to the insurance industry, the evaluation dimensions include, but are not limited to: existing contribution, annual income, loyalty, premium gaps.

It is understood that the present application may also be applied to other fields, such as the digital medical field, and is not limited thereto.

It is understood that the customer data may also include other data, such as, but not limited to, customer age, home address.

For step S2, a preset clustering algorithm is adopted to cluster each piece of customer data into cluster sets with the same number as the preset cluster number, and each cluster set is used as a first cluster set. That is, the number of first cluster sets is the same as the preset number of clusters.

Each of said customer data is divided into only one of the first cluster sets, i.e. the customer identities of the customer data in different first cluster sets are different.

Optionally, the preset clustering algorithm is a Kmeans clustering algorithm. It can be understood that the preset clustering algorithm may also be other clustering algorithms, which are not described herein.

The implementation method for clustering and dividing each client data into the cluster sets with the same number as the preset cluster number by using a Kmeans clustering algorithm is not repeated herein.

Optionally, a preset script is executed, a preset clustering algorithm and a preset clustering number are adopted, clustering division is performed on each piece of customer data in a Hive (a data warehouse tool based on Hadoop) database, and each cluster set obtained through division is used as one first cluster set. Hive is a data warehouse tool based on a big data platform, clustering can be performed based on more data, the data processing speed is high, and the clustering speed is improved.

For S3, the ratio mapping table of the evaluation dimension and the high-low quantile point input by the user may be obtained, the ratio mapping table of the evaluation dimension and the high-low quantile point may be obtained from a database, and the ratio mapping table of the evaluation dimension and the high-low quantile point may be obtained from a third-party application system.

The evaluation dimension and high-low quantile proportion mapping table comprises the following steps: the dimensions, high quantile ratio and low quantile ratio were evaluated. It can be understood that the proportions of the high-resolution sites of the evaluation dimensions may be all the same, may be partially the same, or may be all different; the low quantile ratios of the individual evaluation dimensions may all be the same, may be partially the same, or may all be different.

Optionally, the high quantile ratio of each evaluation dimension may be all the same, and the low quantile ratio of each evaluation dimension may be all the same, where the high quantile ratio is set to 90%, and the low quantile ratio is set to 10%.

Optionally, the high quantile proportion of each evaluation dimension may be all the same, and the low quantile proportion of each evaluation dimension may be all the same, where the high quantile proportion is set to 95%, and the low quantile proportion is set to 5%.

In the first clustering set, performing high score point scoring calculation according to the high score point proportion of each evaluation dimension; and in the first clustering set, performing low-score point score calculation according to the low-score point proportion of each evaluation dimension. That is, the number of high quantile scores and the number of evaluation dimensions corresponding to each first cluster set are the same, and the number of low quantile scores and the number of evaluation dimensions corresponding to each first cluster set are the same.

For example, when the application is applied to the insurance industry, the evaluation dimensions include: the number of high quantile scores corresponding to each first cluster set is 4 (the existing contribution, the annual income, the loyalty, and the premium gap each correspond to a high quantile score), and the number of low quantile scores corresponding to each first cluster set is 4 (the existing contribution, the annual income, the loyalty, and the premium gap each correspond to a low quantile score), which is not specifically limited in this example.

For example, the proportion of high-scoring points of the existing contribution of the evaluation dimension is 90%, after the scores corresponding to the existing contribution in the first cluster set are sorted in a positive order (from low to high), scores corresponding to the existing contribution in 90% (90% of the total amount of the customer data in the first cluster set) of the customer data which are searched from low to high and sorted are used as the high-scoring points, which is not specifically limited in this example.

For example, for the scores corresponding to the existing contributions in 90% of the sorted customer data as top scoring, 100 customer data are obtained after the scores corresponding to the existing contributions in the first cluster set are sorted in a positive order (from low to high), and the scores corresponding to the existing contributions in the 90-sorted customer data are used as top scoring by searching from low to high, which is not specifically limited in this example.

For example, a score corresponding to an existing contribution in the customer data ranked at the high quantile ratio is used as a high quantile score, and when no customer data ranked at the high quantile ratio is found, a score corresponding to an existing contribution in the customer data ranked less than the high quantile ratio and closest to the high quantile ratio is used as a high quantile score.

For example, a score corresponding to an existing contribution in the customer data ranked at the low quantile ratio is taken as a low quantile score, and when the customer data ranked at the low quantile ratio is not ranked, a score corresponding to an existing contribution in the customer data ranked more than the low quantile ratio and closest to the low quantile ratio is taken as a low quantile score.

For S4, the customer data with a score of each evaluation dimension between the high quantile score and the low quantile score of the evaluation dimension are extracted from the first cluster set, each extracted customer data is used as the second cluster set corresponding to the first cluster set, and each remaining customer data not extracted in the first cluster set is used as the uncategorized customer data set corresponding to the first cluster set.

For S5, performing weighted calculation according to the scores of the second cluster set corresponding to each evaluation dimension, and taking the calculated weighted values as the target centroid corresponding to the second cluster set.

For S6, dividing the customer data in each of the uncategorized customer data sets into the second cluster set closest to the target centroid corresponding to each of the second cluster sets; and after the division of the client data in each of the uncategorized client data sets is completed, each of the second clustering sets is used as a client classification result.

In an embodiment, the step of performing score calculation of high score and score calculation of low score for each evaluation dimension on the first cluster set by using the obtained evaluation dimension and high and low score ratio mapping table includes:

s31: randomly acquiring one first cluster set as a cluster set to be analyzed;

s32: arbitrarily acquiring one evaluation dimension as an evaluation dimension to be analyzed;

s33: according to the evaluation dimension to be analyzed, performing positive sequence sorting on the cluster set to be analyzed to obtain a sorted cluster set;

s34: acquiring a target high quantile point proportion and a target low quantile point proportion corresponding to the evaluation dimension to be analyzed from the evaluation dimension and high and low quantile point proportion mapping table;

s35: according to the target high-score point proportion and the cluster set to be analyzed, performing high-score point scoring calculation on the evaluation dimension to be analyzed;

s36: according to the target low-score site proportion and the cluster set to be analyzed, carrying out low-score site scoring calculation on the evaluation dimension to be analyzed;

s37: repeatedly executing the step of randomly acquiring one evaluation dimension as the evaluation dimension to be analyzed until the evaluation dimension is acquired;

s38: and repeatedly executing the step of randomly acquiring one first cluster set as a cluster set to be analyzed until the acquisition of the first cluster set is completed.

In this embodiment, the obtained ratio mapping table of the evaluation dimensions and the high-low quantile is adopted to perform the high-quantile score calculation and the low-quantile score calculation of each evaluation dimension on the first cluster set, so as to provide a basis for adjusting the centroid again for each class obtained by means of the following Kmeans clustering.

For S31, one of the first cluster sets is arbitrarily obtained, and the obtained first cluster set is used as a cluster set to be analyzed.

For S32, one of the evaluation dimensions is arbitrarily acquired, and the acquired evaluation dimension is taken as an evaluation dimension to be analyzed.

For S33, sorting each piece of customer data in the cluster set to be analyzed in the evaluation dimension to be analyzed in a positive order (from low to high), and using the sorted cluster set to be analyzed as a sorted cluster set.

For S34, according to the evaluation dimension to be analyzed, an evaluation dimension is searched in the evaluation dimension and high-low quantile ratio mapping table, a high quantile ratio corresponding to the evaluation dimension found in the evaluation dimension and high-low quantile ratio mapping table is used as a target high quantile ratio, and a low quantile ratio corresponding to the evaluation dimension found in the evaluation dimension and high-low quantile ratio mapping table is used as a target low quantile ratio.

For step S35, of the scores in the cluster set to be analyzed corresponding to the evaluation dimension to be analyzed, the scores in the target high-scoring point proportion are ranked as target high-scoring points, and the target high-scoring points are used as the high-scoring points in the cluster set to be analyzed corresponding to the evaluation dimension to be analyzed.

It is understood that when a score ranking at the target high-scoring site proportion is not present, then a score ranking less than and closest to the target high-scoring site proportion is taken as the target high-scoring.

For step S36, of the scores in the cluster set to be analyzed corresponding to the evaluation dimension to be analyzed, the scores in the target low score proportion are ranked as target low scores, and the target low scores are used as the low score scores of the cluster set to be analyzed corresponding to the evaluation dimension to be analyzed.

It is understood that when a score ranking at the target low quantile proportion is not present, then a score ranking less than the target low quantile proportion and closest to the target low quantile proportion is taken as the target low score.

For S37, steps S32 to S37 are repeatedly performed until the evaluation dimension acquisition is completed. When the acquisition of the evaluation dimension is completed, the calculation of each high score point score and the calculation of each low score point score corresponding to the cluster set to be analyzed are completed.

For S38, steps S31 through S38 are repeatedly performed until the first cluster set acquisition is completed. When the acquisition of the first cluster set is completed, it means that the calculation of each high scoring point score and the calculation of each low scoring point score corresponding to each first cluster set are completed.

In an embodiment, the step of dividing the customer data in the first cluster set according to the scores of the high quantile and the scores of the low quantile corresponding to the first cluster set to obtain a second cluster set and an uncategorized customer data set includes:

s41: randomly acquiring one first cluster set as a cluster set to be processed;

s42: randomly acquiring one piece of customer data from the cluster set to be processed as customer data to be analyzed;

s43: according to the scores of the high quantile points and the scores of the low quantile points corresponding to the cluster set to be processed, dividing the customer data to be analyzed into the second cluster set or the unclassified customer data set corresponding to the cluster set to be processed;

s44: repeatedly executing the step of randomly acquiring one piece of customer data from the cluster set to be processed as customer data to be analyzed until the customer data acquisition in the cluster set to be processed is completed;

s45: and repeatedly executing the step of randomly acquiring one first cluster set as the cluster set to be processed until the acquisition of the first cluster set is completed.

Since the screening is performed through the value range of each evaluation dimension of each cluster, it is found that each client data cannot be simply split through the value range of each evaluation dimension, and a cluster that does not belong to a certain cluster is split, but a cluster that belongs to a certain cluster is split into other clusters.

For S41, one of the first cluster sets is arbitrarily obtained, and the obtained first cluster set is used as a cluster set to be processed.

For S42, one piece of the customer data is arbitrarily acquired from the cluster set to be processed, and the acquired customer data is taken as customer data to be analyzed.

For S43, the to-be-analyzed client data is classified into the second cluster set when each evaluation dimension is between the high score and the low score corresponding to the to-be-processed cluster set, or into the uncategorized client data set when each evaluation dimension is between the high score and the low score corresponding to the to-be-processed cluster set.

For S44, the steps S42 to S44 are repeatedly executed until the customer data acquisition in the cluster set to be processed is completed. When the acquisition of the customer data in the to-be-processed cluster set is completed, it means that the division of each customer data in the to-be-processed cluster set has been completed.

For S45, steps S41 through S45 are repeatedly performed until the acquisition of the first cluster set is completed. When the acquisition of the first cluster set is completed, it means that the division of each of the customer data in each of the first cluster sets has been completed.

In an embodiment, the step of dividing the customer data to be analyzed into the second cluster set or the unclassified customer data set corresponding to the cluster set to be processed according to each high quantile score and each low quantile score corresponding to the cluster set to be processed includes:

s431: randomly acquiring one evaluation dimension as an evaluation dimension to be processed;

s432: obtaining a score corresponding to the evaluation dimension to be processed from the customer data to be analyzed as a target score;

s433: taking the score of the cluster set to be processed at the low score point corresponding to the evaluation dimension to be processed as a starting point of a score range, and taking the score of the cluster set to be processed at the high score point corresponding to the evaluation dimension to be processed as an end point of the score range;

s434: when the target score is located in the score range, repeatedly executing the step of randomly acquiring one evaluation dimension as the evaluation dimension to be processed until the acquisition of the evaluation dimension is completed or the target score is not located in the score range;

s435: when the target score is not within the score range, dividing the customer data to be analyzed into the unclassified customer data set corresponding to the cluster set to be processed;

s436: and when the target score is within the score range, dividing the customer data to be analyzed into the second cluster set corresponding to the cluster set to be processed.

According to the high quantile score and the low quantile score corresponding to the cluster set to be processed, the client data to be analyzed are divided into the second cluster set or the unclassified client data set corresponding to the cluster set to be processed, so that the client data belonging to a certain cluster and divided into other clusters are found out, and the accuracy of client classification is improved.

For S431, arbitrarily acquiring one of the evaluation dimensions, and taking the acquired evaluation dimension as an evaluation dimension to be processed.

For step S432, a score corresponding to the evaluation dimension to be processed is obtained from the customer data to be analyzed, and the obtained score is taken as a target score.

For S433, the score of the cluster set to be processed at the low score point corresponding to the evaluation dimension to be processed is used as a starting point of a score range, and the score of the cluster set to be processed at the high score point corresponding to the evaluation dimension to be processed is used as an ending point of the score range, so that the score range of the cluster set to be processed at the evaluation dimension to be processed is obtained.

For S434, when the target score is within the score range, which means that the customer data to be analyzed meets the requirements of the current cluster set, the steps S431 to S434 are repeatedly executed until the acquisition of the evaluation dimension is completed or the target score is not within the score range. When the acquisition of the evaluation dimension is completed, it means that the evaluation of whether the evaluation dimension of the customer data to be analyzed is within the score range has been completed.

For S435, when the target score is not within the score range, which means that the score of the customer data to be analyzed in the presence evaluation dimension is not within the score range, the customer data to be analyzed does not meet the requirement of the currently belonging cluster set, and therefore the customer data to be analyzed may be classified into the unclassified customer data set corresponding to the cluster set to be processed.

For S436, when the target score is within the score range, which means that the scores of the evaluation dimensions of the customer data to be analyzed are within the score range, the customer data to be analyzed meets the requirements of the current cluster set, so that the customer data to be analyzed can be classified into the second cluster set corresponding to the cluster set to be processed.

In an embodiment, the step of dividing the customer data in each of the uncategorized customer data sets into each of the second clustering sets according to each of the target centroids to obtain a customer classification result includes:

s61: randomly acquiring one piece of customer data from each uncategorized customer data set as customer data to be categorized;

s62: randomly acquiring one target centroid as a centroid to be calculated;

s63: respectively carrying out distance calculation of each evaluation dimension on the customer data to be classified and the centroid to be calculated to obtain a plurality of first distances;

s64: carrying out weighted summation on the first distances to obtain a second distance corresponding to the centroid to be calculated;

s65: repeatedly executing the step of randomly acquiring one target centroid as the centroid to be calculated until the acquisition of the target centroid is completed;

s66: classifying the customer data to be classified into one of the second clustering sets according to the second distances;

s67: repeatedly executing the step of randomly acquiring one piece of client data from each unblassified client data set as client data to be classified until the client data in each unblassified client data set is acquired;

s68: and taking each second clustering set as the client classification result.

According to the embodiment, the customer data in the non-classified customer data set is divided again according to the target centroids, so that the accuracy of the second classification set is improved, and the accuracy of the customer classification result is improved.

For S61, one of the customer data is arbitrarily acquired as customer data to be categorized from each of the uncategorized customer data sets.

For S62, one of the target centroids is arbitrarily acquired, and the acquired target centroid is taken as a centroid to be calculated.

For S63, a distance calculation for each of the evaluation dimensions is performed on the customer data to be categorized and the centroid to be calculated, respectively, to obtain a plurality of first distances. That is, the number of the first distances is the same as the number of the evaluation dimensions. For example, a distance between the score corresponding to the annual income (evaluation dimension) of the customer data to be categorized and the numerical value corresponding to the annual income of the centroid to be calculated is a second distance.

For S64, weighted summation is performed on each first distance, and the data obtained by weighted summation is used as a second distance between the customer data to be classified and the centroid to be calculated.

For S65, steps S62 to S65 are repeatedly performed until the acquisition of the target centroid is completed. That is, the number of second distances calculated per round is the same as the number of the target centroids.

For S66, finding a minimum value from each of the second distances, and classifying the customer data to be classified into the second cluster set corresponding to the found minimum value.

For S67, the steps S61 to S67 are repeatedly executed until the acquisition of the customer data in each of the uncategorized customer data sets is completed. When the obtaining of the customer data in each of the uncategorized customer data sets is completed, the repartitioning of each of the customer data in each of the uncategorized customer data sets is completed.

For S68, each of the second sorted sets after completion of the repartitioning of each of the customer data in each of the uncategorized customer data sets is taken as the customer sorting result.

In an embodiment, the step of classifying the customer data to be classified into one of the second clustering sets according to the second distances includes:

s661: finding out the second distance with the minimum value from the second distances as a target distance;

s662: and classifying the customer data to be classified into the second clustering set corresponding to the target centroid corresponding to the target distance.

According to the embodiment, the client data to be classified is classified according to the minimum value in the second distances, so that the accuracy of the second clustering set is improved, and the accuracy of the client classification result is improved.

For S661, the second distance with the smallest value is found from the respective second distances, and the second distance is found as a target distance.

For S662, the customer data to be categorized is categorized into the second cluster set corresponding to the target centroid corresponding to the target distance, so that the customer data to be categorized is reclassified, and the accuracy of the second cluster set is improved.

In an embodiment, the step of performing centroid calculation on the second cluster set to obtain a target centroid includes:

s51: randomly acquiring one second cluster set as a cluster set to be evaluated;

s52: randomly acquiring one evaluation dimension as an evaluation dimension to be evaluated;

s53: performing weighted calculation on the evaluation dimension to be evaluated according to the cluster set to be evaluated to obtain a single-dimension weighted value;

s54: repeatedly executing the step of randomly acquiring one evaluation dimension as an evaluation dimension to be evaluated until the acquisition of the evaluation dimension is completed;

s55: taking each single-dimensional weighted value as the target centroid corresponding to the cluster set to be evaluated;

s56: and repeatedly executing the step of randomly acquiring one second cluster set as the cluster set to be evaluated until the acquisition of the second cluster set is completed.

The embodiment performs the centroid calculation on the second cluster set, realizes the centroid adjustment of the clusters, and provides a basis for re-partitioning the customer data which belongs to a certain cluster and is partitioned into other clusters based on the adjusted centroid.

For S51, arbitrarily obtaining one of the second cluster sets, and taking the obtained second cluster set as a cluster set to be evaluated.

For S52, one of the evaluation dimensions is arbitrarily acquired, and the acquired evaluation dimension is taken as an evaluation dimension to be evaluated.

And S53, performing weighted calculation according to each score in the cluster set to be evaluated, which corresponds to the evaluation dimension to be evaluated, and taking the data obtained through weighted calculation as a single-dimension weighted value.

For S54, steps S52 to S55 are repeatedly performed until the acquisition of the evaluation dimension is completed. When the acquisition of the evaluation dimension is completed, it means that the calculation of the single-dimensional weighting value of each evaluation dimension of the cluster set to be evaluated is completed.

For S55, each of the single-dimensional weighted values is used as the target centroid corresponding to the cluster set to be evaluated, thereby providing a basis for evaluating the distance of customer data from the target centroid based on a single evaluation dimension.

For S56, steps S51 through S56 are repeatedly performed until the acquisition of the second cluster set is completed. When the acquisition of the second cluster sets is completed, it means that the calculation of the respective corresponding target centroids of the respective second cluster sets is completed.

Referring to fig. 2, the present application further proposes a customer classification apparatus based on clustering, the apparatus comprising:

a client data acquisition module 100 for acquiring a plurality of client data;

the first clustering module 200 is configured to perform clustering division on each piece of client data by using a preset clustering algorithm and a preset clustering number to obtain a plurality of first cluster sets;

the quantile point score calculation module 300 is configured to perform high quantile point score calculation and low quantile point score calculation for each evaluation dimension on the first cluster set by using the obtained evaluation dimension and high and low quantile point ratio mapping table;

a second clustering module 400, configured to divide the client data in the first clustering set according to the scores of the high quantile points and the scores of the low quantile points corresponding to the first clustering set, so as to obtain a second clustering set and an unclassified client data set;

a target centroid calculation module 500, configured to perform centroid calculation on the second cluster set to obtain a target centroid;

a customer classification result determining module 600, configured to divide the customer data in each of the uncategorized customer data sets into each of the second clustering sets according to each of the target centroids, so as to obtain a customer classification result.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data such as a clustering-based customer classification method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a cluster-based customer classification method. The customer classification method based on clustering comprises the following steps: acquiring a plurality of customer data; clustering and dividing each customer data by adopting a preset clustering algorithm and a preset clustering number to obtain a plurality of first clustering sets; respectively carrying out high quantile point score calculation and low quantile point score calculation on each evaluation dimension on the first cluster set by adopting the obtained evaluation dimension and high and low quantile point ratio mapping table; dividing the customer data in the first clustering set according to the scores of the high quantile points and the scores of the low quantile points corresponding to the first clustering set to obtain a second clustering set and an unclassified customer data set; carrying out centroid calculation on the second cluster set to obtain a target centroid; and according to the target centroids, dividing the customer data in the unclassified customer data sets into the second clustering sets to obtain customer classification results.

An embodiment of the present application further provides a computer-readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, implementing a method for cluster-based customer classification, comprising the steps of: acquiring a plurality of customer data; clustering and dividing each customer data by adopting a preset clustering algorithm and a preset clustering number to obtain a plurality of first clustering sets; respectively carrying out high quantile point score calculation and low quantile point score calculation on each evaluation dimension on the first cluster set by adopting the obtained evaluation dimension and high and low quantile point ratio mapping table; dividing the customer data in the first clustering set according to the scores of the high quantile points and the scores of the low quantile points corresponding to the first clustering set to obtain a second clustering set and an unclassified customer data set; carrying out centroid calculation on the second cluster set to obtain a target centroid; and according to the target centroids, dividing the customer data in the unclassified customer data sets into the second clustering sets to obtain customer classification results.

The above-mentioned performed customer classification method based on clustering includes firstly performing clustering division on each customer data by using a preset clustering algorithm and a preset clustering number to obtain a plurality of first cluster sets, secondly performing high score point score calculation and low score point score calculation for each evaluation dimension on each first cluster set by using an obtained evaluation dimension and high and low score point ratio mapping table, dividing the customer data in the first cluster sets according to each high score point score and each low score point score corresponding to the first cluster sets to obtain a second cluster set and an unclassified customer data set, secondly performing centroid calculation on the second cluster set to obtain a target centroid, and finally dividing the customer data in each unclassified customer data set into each second cluster set according to each target centroid, and obtaining a client classification result. Therefore, the centroid is adjusted again for each class obtained by the preset clustering algorithm, the forward change of the whole data can be accurately identified, an unsupervised clustering result obtained by the preset clustering algorithm does not need to be adopted as a calibration value of a sample and input into a supervised clustering model for training so as to classify customers, and the accuracy of customer classification is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for clustering based customer classification, the method comprising:

acquiring a plurality of customer data;

2. The method for classifying clients based on clusters according to claim 1, wherein the step of performing score calculation of high score and score calculation of low score for each evaluation dimension for the first cluster set using the obtained ratio mapping table of evaluation dimension to high score and low score comprises:

randomly acquiring one first cluster set as a cluster set to be analyzed;

3. The method according to claim 1, wherein the step of classifying the customer data in the first cluster set according to the scores of the high quantile and the scores of the low quantile corresponding to the first cluster set to obtain a second cluster set and an unclassified customer data set comprises:

randomly acquiring one first cluster set as a cluster set to be processed;

4. The method according to claim 3, wherein the step of classifying the customer data to be analyzed into the second cluster set or the unclassified customer data set corresponding to the cluster set to be processed according to the score of each high quantile and the score of each low quantile corresponding to the cluster set to be processed comprises:

5. The method of claim 1, wherein said step of classifying said customer data in each of said uncategorized customer data sets into each of said second cluster sets according to each of said target centroids to obtain a customer classification result comprises:

randomly acquiring one target centroid as a centroid to be calculated;

and taking each second clustering set as the client classification result.

6. The cluster-based customer categorization method of claim 5 wherein the step of categorizing the customer data to be categorized into one of the respective second clusters according to the respective second distances comprises:

7. The method for classifying customers based on clusters according to claim 1, wherein the step of performing centroid calculation on the second cluster set to obtain a target centroid comprises:

randomly acquiring one second cluster set as a cluster set to be evaluated;

8. An apparatus for clustering based customer classification, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.