CN104850868A

CN104850868A - Customer segmentation method based on k-means and neural network cluster

Info

Publication number: CN104850868A
Application number: CN201510323644.9A
Authority: CN
Inventors: 刘念
Original assignee: Sichuan You Lian Information Technology Co Ltd
Current assignee: Sichuan You Lian Information Technology Co Ltd
Priority date: 2015-06-12
Filing date: 2015-06-12
Publication date: 2015-08-19

Abstract

The invention discloses a customer segmentation method based on k-means and neural network cluster, and the method comprises the following steps: (1) carrying out random sampling from general data, wherein selected data serves as sample data; (2) carrying out k-means clustering of the sample data selected at step (1), and calculating the class of each data sample; (3) taking clustering results obtained at step (2) as a training sample, calculating a weight value of each layer of each property through a neural network, and obtaining a trained neural network; (4) inputting the general data into the trained neural network, and calculating the class of the general data. According to the invention, only a few of data samples are selected, and the probability that an isolated point is selected is very low. Moreover, the weight value of each property is calculated through a BP neural network, and the impact on the results from each property is avoided, thereby ironing out the defects of a conventional K-means clustering algorithm, and enabling a clustering effect to accord with the actual demands of customer segmentation.

Description

A kind of customer segmentation method based on k-means and neural network clustering

Technical field

The present invention relates to Data Mining, particularly a kind of customer segmentation method based on k-means and neural network clustering.

Background technology

After China's accession to the WTO, in the face of foreign bank enter the in-depth with financial reform, financial competition is more fierce, and top-tier customer becomes the focus of competition among banks gradually.Dissimilar client is fairly obvious to the value variance that bank brings, bank is by identifying, distinguishing this species diversity, instruct it more reasonably to configure market sale, service and management resource, obtain larger income with less input, address this problem and just need to carry out customer segmentation.Bank client segmentation refers to that bank is in clear and definite strategy, business model and specialized market, according to the attribute of client, behavior, demand, the factor such as preference and value, client is classified, and provide for product, service and marketing model process.

At present, experience sorting technique and Corpus--based Method analytic approach is had to bank client segmentation traditionally.The bank client segmentation of empirical method is the most original division methods, and generally carry out category division according to oneself experience to client by decision maker, have very strong subjectivity, the result of segmentation is not objective, lacks cogency.The customer segmentation of Corpus--based Method method is a kind of quantitative research, client's category division is carried out according to client properties characteristic statistics result, the result of segmentation often has extremely strong relevance with criteria for classification, if criteria for classification is unreasonable, the result of classification is also unreasonable.Along with deepening continuously of banks of China informatization, bank have accumulated a large amount of case history transaction data and customer data, simultaneously along with the development of network, increasing customer data will be accumulated, in the face of the customer data of magnanimity, traditional customer segmentation method more will seem unable to do what one wishes.In recent years.Data mining technology obtains and develops rapidly, it has merged multiple art such as database, artificial intelligence and statistics, can from a large amount of, incomplete, noisy, fuzzy raw data, excavate useful, credible, novel information and the process of knowledge, wherein K-means cluster is a kind of most important data digging method, and it is widely used in bank client segmentation.

K-means algorithm is based on disintegrating method classical clustering algorithm in data mining technology, because it is theoretical reliable, algorithm is simple, fast convergence rate and being widely used.The thought that K-means algorithm adopts iteration to upgrade, first the representative cluster selecting randomly K object initial or bunch center, it is assigned to again nearest bunch to remaining each object according to the distance at itself and each bunch center again, the center then recalculating each bunch is as the cluster centre of next iteration.Constantly repeat this process, until stop when each cluster centre no longer changes.Iteration makes the cluster centre chosen more and more close to real bunch of center, so Clustering Effect is become better and better, finally all objects is divided into K bunch.

The concrete steps of traditional K-means algorithm:

Input: cluster trees K and the data set X={x1 comprising N number of object, x2, x3, x4 ... xn}.

Export: K clustering cluster { s1, s2, s3 ... sk}, makes objective function minimum.

Concrete steps:

(1) from data set X a Stochastic choice K object as initial cluster center c1, c2, c3 ..., ck;

(2) one by one by object xi (i=1,2,3 ..., n) distribute to a nearest cluster centre cj according to Euclidean distance, 1≤j≤K;

(3) cluster centre cj new in each bunch is recalculated,

(4) until K cluster centre no longer changes, criterion function is restrained.

Fig. 1 is the basic flow sheet of traditional K-means algorithm.

K-means algorithm is the classic algorithm solving clustering problem, and this algorithm is simple and quick.But traditional K-means algorithm has the fatal defects to isolated point sensitivity, if data centralization exists isolated point, so the Clustering Effect of k-means algorithm is not just very desirable.And in cluster process, each attribute is put on an equal footing, so just can not distinguish the impact of different attributes on cluster result.

Summary of the invention

The object of the invention is to overcome above-mentioned deficiency existing in prior art, a kind of customer segmentation method based on k-means and neural network clustering is provided, just a few sample is extracted in the first step, the probability being drawn into isolated point so is in the sample just very low, negligible, and adopt BP neural computing to go out the weights of each attribute, avoid each attribute the same to Influence on test result.

In order to realize foregoing invention object, the invention provides following technical scheme:

Based on a customer segmentation method for k-means and neural network clustering, comprise the following steps:

(1) random sampling from conceptual data, extracting part divided data is as sample data;

(2) k-means cluster is carried out to the sample data that step (1) extracts, calculate the classification belonging to each sample data;

(3) using the cluster result of step (2) as training sample, adopt neural computing to go out the weights of the every one deck of each attribute, and obtain a neural network trained;

(4) conceptual data is input in the neural network trained, calculates the classification belonging to it.

Preferably, described neural network is BP neural network.

Preferably, described BP neural network is 3 layers or is greater than the feed-forward type BP network of 3 layers.

Preferably, described partial data is not more than 30% of conceptual data.

Preferably, described partial data is not more than 15% of conceptual data.

Preferably, described partial data is not more than 5% of conceptual data.

Preferably, the clusters number of described k-means cluster is 5.

Preferably, conceptual data advanced line number Data preprocess before random sampling in described step (1).

Preferably, described data prediction adopts centralization and standardized transformation method.

Compared with prior art, beneficial effect of the present invention:

1. method of the present invention just randomly draws a few sample in conceptual data in the first step, and the probability being drawn into isolated point so is in the sample just very low, negligible, improves the accuracy of cluster;

2. method of the present invention adopts BP neural computing to go out the weights of each attribute, and avoid each attribute the same to Influence on test result, Clustering Effect more suits the actual demand of customer segmentation.

Accompanying drawing explanation

Fig. 1 is the basic flow sheet of traditional K-means algorithm

Fig. 2 is the particular flow sheet of customer segmentation method of the present invention

Embodiment

Below in conjunction with test example and embodiment, the present invention is described in further detail.But this should be interpreted as that the scope of the above-mentioned theme of the present invention is only limitted to following embodiment, all technology realized based on content of the present invention all belong to scope of the present invention.

Customer segmentation method specific implementation step based on k-means and neural network clustering of the present invention is as follows:

(1) random sampling from conceptual data, extracts small part data as sample;

(2) k-means cluster is carried out to the sample data that the first step extracts, calculate the classification belonging to each sample;

(3) using the cluster result of second step as training sample, adopt BP neural computing to go out the weights of the every one deck of each attribute, and obtain a BP neural network trained.

(4) conceptual data is input in the BP neural network that the 3rd step trains, calculates the class belonging to it.

As the conceptual data of this specific embodiment from the client segmentation data in the personal teaching efficacy system of certain bank of city domestic.Input has 2000 bank client samples, and the attribute field that every bar record comprises has: customer number, age, length of service, client's monthly pay, this cash in banks number, bank's access times, debt-credit situation and house situation, altogether eight fields.Bank client client being exported is 5 large class, i.e. premium customers, big customer, common customer, little client, potential customers.

Method of the present invention to the particular flow sheet of customer segmentation as shown in Figure 2, first carries out data prediction to raw data.May because artificial deviation in source data collection process, database is containing there being data that are imperfect, Noise, and each field references different characteristic simultaneously recorded in database, often uses different linear modules, and its value difference is very greatly different.Therefore, necessary pre-service is carried out to improve the quality of data to raw data, thus make data mining process more effectively, classify more accurate.The data prediction of the inventive method adopts centralization and standardized transformation method.Centralization object has identical basic point in order to each field value, and concrete execution is carried out according to the following formula:

x_{ij}^{'} = x_{ij} - Σ_{i = 1}^{n} x_{ij} / n

Wherein, x _ijrepresent the value in i-th jth field recorded, n represents the sum of record, represent all be recorded in j attribute field and, x ' _ijrepresent the value in a jth field of i-th after centralization record.

The basis of centralization is converted it by standardization again, and make the transformation range of each field unified, adopt the standardization of zero-average, it standardizes according to field mean value and standard deviation, and concrete execution is carried out according to the following formula:

{(x_{ij}^{'})}^{'} = (n - 1) x_{ij}^{'} / \sqrt{Σ_{i = 1}^{n} {(x_{ij} - x_{j})}^{2}}

Wherein, x _jrepresent the average of all values be recorded in j attribute field, (x ' _ij) ' represent after the standardization of zero passage one average i-th record a jth field in 0-1 within the scope of value.

After data prediction process, each field basic point is identical, variation range too, its standard deviation is 0, and average is 1.

After data prediction, adopt the clustering algorithm based on k-means and neural network of the present invention to classify to client, detailed process is as follows:

(1) produce 1000 user loggings carry out data prediction from 2000 customer datas after, random sampling 300 clients, as next step sample;

(2) k-means cluster is carried out to client's sample data that the first step extracts, mark off k cluster set, and the classification belonging to each sample is marked;

(3) using the training sample of the cluster result of second step as artificial neural network, adopt BP neural computing to go out the weights of the every one deck of each attribute, and obtain a BP neural network trained.

(4) data of all clients are input in the neural network that the 3rd step trains, calculate the class label belonging to it.

By contrasting the customer segmentation method based on k-means and neural network clustering of the present invention and traditional k-means algorithm, input clusters number K=5, its comparing result is as shown in the table:

The customer segmentation Comparative result of the traditional K-means algorithm of table 1 and the inventive method

As can be seen from the classification results that upper table obtains, customer segmentation method based on k-means and neural network clustering of the present invention, overcome traditional K-means clustering algorithm to isolated point responsive and in cluster process to the shortcoming that each attribute is put on an equal footing.Because the first step just randomly draws low volume data, so the probability being drawn into isolated point is very low.As can be seen from the result of upper table, the Clustering Effect after algorithm improvement more suits the actual demand of bank client segmentation, how to utilize existing mass data to carry out customer segmentation problem to provide a new approach for data mining solves banking system.

Claims

1., based on a customer segmentation method for k-means and neural network clustering, it is characterized in that, comprise the following steps:

2. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, described neural network is BP neural network.

3. the customer segmentation method based on k-means and neural network clustering according to claim 2, is characterized in that, described BP neural network is 3 layers or is greater than the feed-forward type BP network of 3 layers.

4. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, described partial data is not more than 30% of conceptual data.

5. the customer segmentation method based on k-means and neural network clustering according to claim 4, it is characterized in that, it is characterized in that, described partial data is not more than 15% of conceptual data.

6. the customer segmentation method based on k-means and neural network clustering according to claim 5, it is characterized in that, it is characterized in that, described partial data is not more than 5% of conceptual data.

7. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, the clusters number of described k-means cluster is 5.

8. the customer segmentation method based on k-means and neural network clustering according to claim 1, is characterized in that, conceptual data advanced line number Data preprocess before random sampling in described step (1).

9. the customer segmentation method based on k-means and neural network clustering according to claim 8, is characterized in that, described data prediction adopts centralization and standardized transformation method.