CN107194815B

CN107194815B - Client segmentation method and system

Info

Publication number: CN107194815B
Application number: CN201611005111.7A
Authority: CN
Inventors: 马向东; 吴海波; 冯雨旸
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2016-11-15
Filing date: 2016-11-15
Publication date: 2018-06-22
Anticipated expiration: 2036-11-15
Also published as: WO2018090643A1; CN107194815A

Abstract

The invention discloses a kind of client segmentation method and system, this method includes：Obtain the information of all clients；Preset information field is screened from the information of each client；Density-based algorithms model is established, the corresponding local density of each client is calculated according to the information field screened；All clients are divided into according to the local density calculated by different classifications.It is possible thereby to accurately comprehensively classify to client, effective reference frame is provided for product promotion.

Description

Client segmentation method and system

Technical field

The present invention relates to technical field of data processing more particularly to client segmentation method and system.

Background technology

In insurance industry, it usually needs statistic of classification is carried out to the client to insure, to facilitate business personnel according to client Classification makes different marketing strategies.But the existing mode classified to client also rest on according to the age, protection amount, The stage that the data such as premium directly divide.The evaluation condition of which is few, result accuracy is not high, can not excavate inside data Deeper information, thus product promotion can not be done to business personnel, effective reference frame is provided.

Invention content

In view of this, the purpose of the present invention is to provide a kind of client segmentation method and system, how accurate complete to solve The problem of classifying to client to face.

To achieve the above object, the present invention provides a kind of client segmentation method, and the method comprising the steps of：

Obtain the information of all clients；

Preset information field is screened from the information of each client；

Density-based algorithms model is established, the corresponding part of each client is calculated according to the information field screened Density；And

All clients are divided into according to the local density calculated by different classifications.

Preferably, it is previous to include area where client, client unit one belongs to property, client for the preset information field It buys insurance kind responsibility, protection amount, premium and Claims Resolution information, the content of each information field and both corresponds to a numerical value.

Preferably, it is described to establish density-based algorithms model, each visitor is calculated according to the information field screened The step of corresponding local density in family, specifically includes：

The distance between two clients are assessed according to Euclidean distance formula；

Threshold value d for distinguishing client's similarity is set_c；

According to the threshold value d_cLocal density corresponding with each client of local density formula calculating.

Preferably, the Euclidean distance formula is

Wherein d_ijFor the distance between client i and client j, x_i1~x_imThe numerical value of the m information field of corresponding client i, x_j1 ~x_jmThe numerical value of the m information field of corresponding client j.

Preferably, the threshold value d_cThe condition of satisfaction is：The distance between each two client calculated d_ijValue, d_cValue be more than or equal to all d_ijIn 80% value.

Preferably, local density's formula is

Wherein

Preferably, described the step of all clients are divided into different classifications according to result of calculation, specifically includes：

By the local density calculated by sorting from big to small；

All clients are divided by K classification as reference point using K client of local density's maximum；

Judge the optimum value of the classification number K；

The category division to all clients is completed according to the best classification number judged.

Preferably, all clients are divided into the step of K classification as reference point by the K client using local density's maximum Suddenly it specifically includes：

According to K client of sequencing selection local density maximum as reference point；

The similar client that K reference point is less than to the threshold value to distance respectively is classified as one kind；

For client remaining after classification, each the distance between remaining client and the K reference point are calculated respectively, The remaining client and closest reference point are classified as one kind.

Preferably, the step of optimum value of the judgement classification number K specifically includes：

Regard all clients as a domain, wherein each client is a sample；

For the K classification, calculate the center of each classification to first distance at the center in entire domain and；

For each classification, calculate respectively each sample in the category to category center second distance with；

Calculate the summation of the corresponding second distance sum of all K classifications, be denoted as third distance and；

Calculate first distance and with third distance and the ratio between；

Corresponding classification number K is as optimum value during using ratio maximum.

All clients can accurately be divided by client segmentation method proposed by the present invention comprehensively according to client's property Different classifications, and classification number is optimized, make classification more reasonable, product promotion offer can be provided to business personnel Effective reference frame is conducive to business personnel's precision marketing.

To achieve the above object, the present invention also proposes a kind of client segmentation system, which includes：

Acquisition module, for obtaining the information of all clients；

Screening module, for screening preset information field from the information of each client；

Computing module for establishing density-based algorithms model, calculates each according to the information field screened The corresponding local density of client；And

Sort module, for all clients to be divided into different classifications according to the local density calculated.

Preferably, the process of the corresponding local density of each client of the computing module calculating specifically includes：

Threshold value d for distinguishing client's similarity is set_c；

Preferably, all clients are divided into the process of different classifications according to result of calculation and specifically wrapped by the sort module It includes：

By the local density calculated by sorting from big to small；

Judge the optimum value of the classification number K；

All clients can accurately be divided by client segmentation system proposed by the present invention comprehensively according to client's property Different classifications, and classification number is optimized, make classification more reasonable, product promotion offer can be provided to business personnel Effective reference frame is conducive to business personnel's precision marketing.

Description of the drawings

Fig. 1 is a kind of flow chart for client segmentation method that first embodiment of the invention proposes；

Fig. 2 is the particular flow sheet of step S104 in Fig. 1；

Fig. 3 is the particular flow sheet of step S106 in Fig. 1；

Fig. 4 is the particular flow sheet of step S302 in Fig. 3；

Fig. 5 is a kind of module diagram for client segmentation system that second embodiment of the invention proposes；

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

In order to make technical problems, technical solutions and advantages to be solved clearer, clear, tie below Drawings and examples are closed, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only To explain the present invention, it is not intended to limit the present invention.

First embodiment

As shown in Figure 1, first embodiment of the invention proposes a kind of client segmentation method, this method includes the following steps：

S100 obtains the information of all clients.

Specifically, obtain institute it is in need progress statistic of classification client relevant information, wherein, the number of the client is N (n is positive integer).

S102 screens preset information field from the information of each client.

Specifically, the m information fields (m is positive integer) for having reference value can be preset, to divide as to client The foundation of class.I.e. each client includes m effective information fields, for example, area where client, client unit one belongs to property, Client bought insurance kind responsibility, protection amount, premium and Claims Resolution information etc. in the past.

In the present embodiment, the content of the m information field can be converted to corresponding numerical value, subsequently to calculate The distance between client, so as to judge the similarity between client.For example, area where client is Beijing then by corresponding information Field is denoted as numerical value 1, and corresponding information field is then denoted as numerical value 2 etc. by client location for Shanghai, can be according to client location Geographical location is far and near or the setting conditions such as city size to set corresponding numerical value for each location.For another example, client Corresponding information field is then denoted as numerical value 1 by protection amount for less than 100,000, and the protection amount of client then remembers corresponding information field for 10-50 ten thousand For numerical value 2, corresponding information field is then denoted as numerical value 3 etc. by the protection amount of client for 50-100 ten thousand.

S104 establishes density-based algorithms model, and calculating each client according to the information field screened corresponds to Local density.

Specifically, as shown in fig.2, particular flow sheet for the step S104.The flow includes step：

S200 assesses the distance between two clients according to Euclidean distance formula.

In the present embodiment, the Euclidean distance formula is

Wherein d_ijFor the distance between client i (i=1,2 ..., n) and client j (j=1,2 ..., n), x_i1~x_imIt is corresponding The numerical value of the m information field of client i, x_j1~x_jmThe numerical value of the m information field of corresponding client j.The distance is used to reflect Similarity between two clients, the distance d calculated_ijValue it is smaller, represent client i and client j between it is more similar.

In the present embodiment, it for the n client, is wherein required for calculating the distance d between each two client_ij, So as to judge the similarity between each two client.

S202 sets the threshold value for distinguishing client's similarity.

In the present embodiment, the threshold value is denoted as d_c, for distinguishing more similar between each two client or less phase Seemingly, needing the condition met is：The distance between each two client calculated d_ijValue, d_cValue is more than or equal to all d_ij In 80% value.For example, it is assumed that the d calculated for all clients_ij100 are shared, then the threshold value d_cIt needs to be more than or equal to Wherein 80 d_ijValue.As the distance between two clients d_ijLess than the threshold value d_cWhen, it is believed that two clients are more similar； As the distance between two clients d_ijMore than or equal to the threshold value d_cWhen, it is believed that two clients are less similar.

S204, according to threshold value local density corresponding with each client of local density formula calculating.

In the present embodiment, local density's formula is

Wherein

The local density is for reflecting the quantity of other clients more similar to the client, when the office calculated Portion's density is bigger, represents that the quantity of other clients more similar to the client is more.

Fig. 1, S106 are returned to, all clients are divided into according to result of calculation by different classifications.

Specifically, as shown in fig.3, particular flow sheet for the step S106.The flow includes step：

S300, by the local density calculated by sorting from big to small.

Specifically, for each client, a corresponding local density can be all calculated, i.e. n client will correspond to n Local density, then by the n local density by sorting from big to small.

All clients are divided into K classification (0 by S302 using K client of local density's maximum as reference point<K<n).Institute It states reference point to refer to the client as the standard for dividing classification, i.e. other more similar to the client as reference point are objective Family can be classified as one kind with the client.

Specifically, as shown in fig.4, particular flow sheet for the step S302.The flow includes step：

S400, according to K client of sequencing selection local density maximum as reference point.

For example, 3 clients A, B, C of local density's maximum are selected as reference point.

S402, the similar client which is less than to the threshold value to distance respectively are classified as one kind.

For example, for above-mentioned client A, the distance between the client A is found out less than the threshold value d_cAll similar visitors All clients more similar to the client A (are found out) in family, and the client A and the client found out then are classified as the first kind Not.For above-mentioned client B, the distance between the client B is found out less than the threshold value d_cAll similar clients (find out institute Have the client more similar to the client B), the client B and the client found out are then classified as second category.For above-mentioned visitor Family C finds out the distance between the client C less than the threshold value d_cAll similar clients (find out all with the client C ratios More similar client), the client C and the client found out are then classified as third classification.

S404, for remaining client after the classification, calculate respectively between each client and the K reference point away from From the client and closest reference point are classified as one kind.

For example, it is assumed that client A and client A₁、A₂、A₃It is classified as first category, client B and client B₁Second category is classified as, visitor Family C and client C₁、C₂Third classification is classified as, client D, E is in addition there remains and is not classified.Therefore, client D and ginseng are calculated respectively According to the distance between the distance between client A, B, C and client E and reference point client A, B, C, it is assumed that client D and client B The distance between recently, client D recently, is then classified as second category by the distance between client E and client A, and client E is classified as the One classification.

Fig. 3, S304 are returned to, judges the optimum value of the classification number K.

Specifically, when the client's number K for being elected to be reference point is differed, K different client's classifications can also be obtained.Example Such as, when selecting 3 clients of local density's maximum as reference point, all clients will be divided into 3 classifications；When selection office When 4 clients of portion's density maximum are as reference point, all clients will be divided into 4 classifications, and so on.Therefore, it is necessary to The optimum value of the classification number K is judged according to scheduled algorithm, so that corresponding classification is most reasonable.

In the present embodiment, all clients can be regarded to a domain U as, wherein each client is a sample (common n sample This), each sample corresponds to m attribute (i.e. described information field), and all samples in the U of the domain are divided into K classification.First For K client's classification, the other center of each customer class is calculated to first distance and D at the center in entire domain₁, then it is directed to Each client's classification calculates each sample (client) in client's classification to the second distance of client's class center respectively And D₂, and the summation of the corresponding second distance sum of all K client classifications is calculated, it is denoted as third distance and D₃, finally calculate First distance and with third distance and the ratio between D₁/D₃, by D₁/D₃Corresponding client's classification number K is as most during ratio maximum Good value.Wherein described center refers to each attribute of corresponding sample being averaged.Such as client's class center is by this All specimen needles included in client's classification are averaged each attribute, and the center in entire domain is that will be included in entire domain All specimen needles are averaged each attribute.

For example, it is assumed that when the classification number is K₁When, calculate corresponding D₁/D₃=R₁；When the classification number is K₂ When, calculate corresponding D₁/D₃=R₂；When the classification number is K₃When, calculate corresponding D₁/D₃=R₃, and R₂>R₃> R₁, then by R₂Corresponding classification number K₂As optimum value.That is, in these cases, all clients are divided into K₂It is a Classification is the most reasonable.

S306 completes the category division to all clients according to the best classification number judged.

For example, it is assumed that the optimum value for judging the classification number K is 4, then according to the 4 of above-mentioned selection local density maximum All clients are divided into 4 classes otherwise, completed to there is the category division of client by a client as reference point.

Client segmentation method described in the present embodiment comprehensively accurately can divide all clients according to client's property For different classifications, and classification number is optimized, makes classification more reasonable, product promotion can be done to business personnel and carried For effective reference frame, be conducive to business personnel's precision marketing.

Second embodiment

As shown in figure 5, second embodiment of the invention proposes a kind of client segmentation system 50.In the present embodiment, the visitor Family categorizing system 50 includes acquisition module 502, screening module 504, computing module 506 and sort module 508.

The acquisition module 502, for obtaining the information of all clients.

Specifically, acquisition module 502 obtain institute it is in need progress statistic of classification client relevant information, wherein, it is described The number of client is n (n is positive integer).

The screening module 504, for screening preset information field from the information of each client.

The computing module 506, for establishing density-based algorithms model, according to the information field meter screened Calculate the corresponding local density of each client.

Specifically, computing module 506 assesses the distance between two clients according to Euclidean distance formula first.In this implementation In example, the Euclidean distance formula is

Computing module 506 sets the threshold value for distinguishing client's similarity.In the present embodiment, the threshold value is denoted as d_c, It is more similar or less similar between each two client for distinguishing, the condition met is needed to be：Every two calculated The distance between a client d_ijValue, d_cValue is more than or equal to all d_ijIn 80% value.For example, it is assumed that it is calculated for all clients The d gone out_ij100 are shared, then the threshold value d_cIt needs to be more than or equal to wherein 80 d_ijValue.As the distance between two clients d_ijLess than the threshold value d_cWhen, it is believed that two clients are more similar；As the distance between two clients d_ijMore than or equal to described Threshold value d_cWhen, it is believed that two clients are less similar.

Computing module 506 is according to threshold value local density corresponding with each client of local density formula calculating.At this In embodiment, local density's formula is

Wherein

The sort module 508, for all clients to be divided into different classifications according to result of calculation.

Specifically, sort module 508 first by the local density calculated by sorting from big to small.For each visitor Family can all calculate a corresponding local density, i.e. n client will correspond to n local density, then that this n part is close Degree by sorting from big to small.

Then, all clients are divided into K class by sort module 508 using K client of local density's maximum as reference point Not (0<K<n).It specifically includes：

(1) according to K client of sequencing selection local density maximum as reference point.For example, selection local density 3 maximum clients A, B, C are as reference point.The reference point refer to by the client as divide classification standard, i.e., with this Other more similar clients of client as reference point can be classified as one kind with the client.

(2) the similar client that the K reference point is less than to the threshold value to distance respectively is classified as one kind.For example, for upper Client A is stated, finds out the distance between the client A less than the threshold value d_cAll similar clients (find out all with the visitor Client more similar family A), the client A and the client found out are then classified as first category.For above-mentioned client B, find out The distance between the client B is less than the threshold value d_cAll similar clients (find out all more similar to the client B Client), the client B and the client found out are then classified as second category.For above-mentioned client C, find out between the client C Distance be less than the threshold value d_cAll similar clients's (finding out all clients more similar to the client C), then will The client C and the client found out are classified as third classification.

(3) for remaining client after the classification, the distance between each client and the K reference point are calculated respectively, The client and closest reference point are classified as one kind.For example, it is assumed that client A and client A₁、A₂、A₃First category is classified as, visitor Family B and client B₁It is classified as second category, client C and client C₁、C₂Third classification is classified as, client D, E is in addition there remains and is not returned Class.Therefore, calculate respectively the distance between client D and reference point client A, B, C and client E and reference point client A, B, C it Between distance, it is assumed that the distance between client D and client B are nearest, the distance between client E and client A recently, then by client D Second category is classified as, client E is classified as first category.

Then, sort module 508 judges the optimum value of the classification number K.Specifically, as the client for being elected to be reference point When number K is differed, K different client's classifications can be also obtained.For example, when selecting 3 clients of local density's maximum as ginseng According to when, all clients will be divided into 3 classifications；When selecting 4 clients of local density's maximum as reference point, own Client will be divided into 4 classifications, and so on.Therefore, it is necessary to judge the classification number K's according to scheduled algorithm Optimum value, so that corresponding classification is most reasonable.

Finally, sort module 508 completes the category division to all clients according to the best classification number judged.Example Such as, it is assumed that it is 4 to judge the optimum value of the classification number K, then according to above-mentioned 4 clients for selecting local density's maximum as All clients are divided into 4 classes otherwise, completed to there is the category division of client by reference point.

The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.

It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row His property includes, so that process, method, article or device including a series of elements not only include those elements, and And it further includes other elements that are not explicitly listed or further includes intrinsic for this process, method, article or device institute Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including this Also there are other identical elements in the process of element, method, article or device.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to be realized by hardware, but very much In the case of the former be more preferably embodiment.Based on such understanding, technical scheme of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software product, which is stored in a storage In medium (such as ROM/RAM, magnetic disc, CD), used including some instructions so that a station terminal equipment (can be mobile phone, calculate Machine, server, air conditioner or network equipment etc.) perform method described in each embodiment of the present invention.

Above by reference to the preferred embodiment of the present invention has been illustrated, not thereby limit to the interest field of the present invention.On It states that serial number of the embodiment of the present invention is for illustration only, does not represent the quality of embodiment.It is patrolled in addition, though showing in flow charts Sequence is collected, but in some cases, it can be with the steps shown or described are performed in an order that is different from the one herein.

Those skilled in the art do not depart from the scope of the present invention and essence, can there are many variant scheme realize the present invention, It can be used for another embodiment for example as the feature of one embodiment and obtain another embodiment.All technologies with the present invention The all any modification, equivalent and improvement made within design, should all be within the interest field of the present invention.

Claims

A kind of 1. client segmentation method, which is characterized in that the method comprising the steps of：

Obtain the information of all clients；

Preset information field is filtered out from the information of each client, information field and client institute including client location Buy the information field of product；

The content of the information field filtered out is made into numeralization processing according to the attributive character of each information field, including：According to visitor The geographical location distance of family location or the corresponding numerical value of information field of city size setting client location, according to visitor The corresponding numerical value of information field that the amount of money section setting client that product is related to buys product is bought at family, is respectively filtered out The corresponding numerical value of information field；

Density-based algorithms model is established, according to each client couple of the corresponding numerical computations of the information field filtered out The local density answered；And

All clients are divided into according to the local density calculated by different classifications.
2. client segmentation method according to claim 1, which is characterized in that the preset information field further includes client Unit one belongs to's property, client buy insurance kind responsibility, protection amount, premium and the Claims Resolution information of product.
3. client segmentation method according to claim 1, which is characterized in that described to establish density-based algorithms mould The step of type, local density corresponding according to each client of the corresponding numerical computations of the information field filtered out, specifically includes：

The distance between two clients are assessed according to Euclidean distance formula；

Threshold value d for distinguishing client's similarity is set_c；

According to the threshold value d_cLocal density corresponding with each client of local density formula calculating.
4. client segmentation method according to claim 3, which is characterized in that the Euclidean distance formula is

Wherein d_ijFor the distance between client i and client j, x_i1~x_imFor the corresponding numerical value of m information field of client i, x_j1~ x_jmThe corresponding numerical value of m information field for client j.
5. client segmentation method according to claim 4, which is characterized in that the threshold value d_cThe condition of satisfaction is：Statistics meter The distance between each two client of calculating d_ijValue, d_cValue be more than or equal to all d_ijIn 80% value.
6. client segmentation method according to claim 4, which is characterized in that local density's formula is

Wherein
7. client segmentation method according to claim 1, which is characterized in that the local density that the basis calculates is by institute There is the step of client is divided into different classifications to specifically include：

By the local density calculated by sorting from big to small；

All clients are divided by K classification as reference point using K client of local density's maximum；

Judge the optimum value of the classification number K；

The category division to all clients is completed according to the best classification number judged.
8. client segmentation method according to claim 7, which is characterized in that the K client with local density's maximum The step of all clients are divided into K classification for reference point specifically includes：

According to K client of sequencing selection local density maximum as reference point；

The similar client that K reference point is less than to threshold value to distance respectively is classified as one kind；

For client remaining after classification, each the distance between remaining client and the K reference point are calculated respectively, by institute It states remaining client and is classified as one kind with closest reference point.
9. client segmentation method according to claim 7, which is characterized in that described to judge that the classification number K's is best The step of value, specifically includes：

Regard all clients as a domain, wherein each client is a sample；

For the K classification, calculate the center of each classification to first distance at the center in entire domain and；

For each classification, calculate respectively each sample in the category to category center second distance with；

Calculate the summation of the corresponding second distance sum of all K classifications, be denoted as third distance and；

Calculate first distance and with third distance and the ratio between；

Corresponding classification number K is as optimum value during using ratio maximum.
10. a kind of client segmentation system, which is characterized in that the system includes：

Acquisition module, for obtaining the information of all clients；

Screening module for filtering out preset information field from the information of each client, includes the letter of client location Breath field and client buy the information field of product；

Computing module, for the content of the information field filtered out to be made according to the attributive character of each information field at numeralization Reason, including：According to the geographical location of client location distance or the information field pair of city size setting client location The numerical value answered buys the corresponding number of information field that the amount of money section setting client that product is related to buys product according to client Value, the corresponding numerical value of information field respectively filtered out；Density-based algorithms model is established, according to what is filtered out The corresponding local density of each client of the corresponding numerical computations of information field；And

Sort module, for all clients to be divided into different classifications according to the local density calculated.
11. client segmentation system according to claim 10, which is characterized in that the computing module calculates each client couple The process for the local density answered specifically includes：

The distance between two clients are assessed according to Euclidean distance formula；

Threshold value d for distinguishing client's similarity is set_c；

According to the threshold value d_cLocal density corresponding with each client of local density formula calculating.
12. client segmentation system according to claim 10, which is characterized in that the sort module will according to result of calculation The process that all clients are divided into different classifications specifically includes：

By the local density calculated by sorting from big to small；

All clients are divided by K classification as reference point using K client of local density's maximum；

Judge the optimum value of the classification number K；

The category division to all clients is completed according to the best classification number judged.