CN106204267A

CN106204267A - A kind of based on improving k means and the customer segmentation system of neural network clustering

Info

Publication number: CN106204267A
Application number: CN201610544043.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Individual
Current assignee: Individual
Priority date: 2016-07-06
Filing date: 2016-07-06
Publication date: 2016-12-07

Abstract

The invention discloses a kind of based on improving k means and the customer segmentation system of neural network clustering, including: bank client data acquisition module, it is used for gathering bank client data, and bank client data is carried out storage to bank network data base；Sample data abstraction module, stochastic sampling in the bank client data from bank network data base, the data of extraction 1/3rd are as sample data；Clustering processing module, for using improvement k means clustering method to cluster each sample of sample data, exports cluster result；Neural metwork training module, for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain a neutral net trained；Client's classification segmentation module, for bank client data being input in the neutral net trained, is finely divided bank client.The present invention reduces the probability being drawn into isolated point in the sample, improve the accuracy of cluster, customer segmentation precision is high.

Description

A kind of based on improving k-means and the customer segmentation system of neural network clustering

Technical field

The present invention relates to Data Mining, be specifically related to a kind of based on improving k-means and the visitor of neural network clustering Family subdivision system.

Background technology

At present, dissimilar client is fairly obvious to the value variance that bank is brought, and bank can be by identification, district Divide this species diversity, instruct it more reasonably to configure market sale, service and manage resource, obtain bigger receipts with less input Benefit, solves this problem and is accomplished by carrying out customer segmentation.Bank client segmentation refer to bank clear and definite strategy, business model and In specialized market, according to factors such as attribute, behavior, demand, preference and the value of client, client is classified, and carry For for the process of product, service and marketing model.

In correlation technique, bank client segmentation there is experience sorting technique and based on statistical analysis method.The bank of empirical method Customer segmentation is the most original division methods, typically according to oneself experience, client is carried out category division by policymaker, has very Strong subjectivity, the result of segmentation is the most objective, lacks cogency.Customer segmentation based on statistical method is a kind of quantitative research, Carrying out client's category division according to client properties characteristic statistics result, the result of segmentation often has extremely strong with criteria for classification Relatedness, if criteria for classification is unreasonable, the result of classification is the most unreasonable.Constantly deep along with banks of China informatization Entering, bank have accumulated substantial amounts of case history transaction data and customer data, simultaneously along with the development of network, it will accumulation Increasing customer data, in the face of the customer data of magnanimity, in correlation technique customer segmentation method more will seem that power is not From the heart.In recent years.Data mining technology has obtained rapid development, and it is many that it has merged data base, artificial intelligence and statistics etc. Individual art, it is possible to from substantial amounts of, incomplete, noisy, fuzzy initial data, excavate useful, credible, new The information of grain husk and the process of knowledge, wherein K-means cluster is a kind of most important data digging method, and it is thin at bank client It is widely used in Fen.But existing K-means clustering method can not be prevented effectively from and single take stochastic sampling side The occasionality that method is brought, cluster stability is low, and has the fatal defects sensitive to isolated point.

Summary of the invention

For the problems referred to above, the present invention provides a kind of based on improving k-means and the customer segmentation system of neural network clustering System.

The purpose of the present invention realizes by the following technical solutions:

A kind of based on improving k-means and the customer segmentation system of neural network clustering, including bank client data acquisition Module, sample data abstraction module, clustering processing module, neural metwork training module, client's classification segmentation module, described bank Customer data acquisition module is used for gathering bank client data, and bank client data carries out storage to bank network data Storehouse；Described sample data abstraction module is stochastic sampling in the bank client data from bank network data base, extracts three The data of/mono-are as sample data；Described clustering processing module is used for using improvement k-means clustering method to sample number According to each sample cluster, export cluster result；Described neural metwork training module for using described cluster result as Training sample, uses neural computing to go out the weights of each layer of each attribute, and obtains a neutral net trained；Institute State client's classification segmentation module for bank client data are input in the neutral net trained, bank client is carried out carefully Point.

Preferably, described customer segmentation system is subdivided into five classes, i.e. premium customers to bank client, big customer, typically Client, little client, potential customers.

Preferably, described neutral net is the feed-forward type BP network more than 3 layers.

Wherein, described clustering processing module uses improvement k-means clustering method to carry out each sample of sample data Cluster, particularly as follows:

1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function All samples similarity between any two, obtains similarity matrix XS；

2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection Like degree, if XS=is [sim (a_i,a_j)]_n×n, i, j=1 ..., n, wherein sim (a_i,a_j) represent sample a_i,a_jBetween similarity, ask With formula it is:

{XS}_{p} = Σ_{j = 1}^{n} s i m (a_{i}, a_{j}), p = 1, ..., n

3) XS is arranged in descending order_p, p=1 ..., n, if XS_pIt is XS by front 4 values arranged from big to small_max,XS_max-1, XS_max-2,XS_max-3If,Select and maximum XS_maxCorresponding sample is made It is first initial center that clusters, otherwise selects and XS_max,XS_max-1,XS_max-2,XS_max-3The average of four corresponding samples As first initial bunch center, T is the ratio value set；

4) it is XS by maximum_maxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum Element be XS_pq, q=1 ..., k-1, k-1 minimum element XS before selecting_pqCorresponding sample is as remaining k-1 The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus Visit the type information of cluster, and finally determine suitable k value；

5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity High cluster, form the k after change and cluster；

6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating Cluster center；

7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, Stopping updating, described object function is:

J = Σ_{l = 1}^{k} \underset{a_{x} &Element; C_{l}}{Σ} | | a_{x} - \overset{&OverBar;}{a_{x l}} | |^{2}

Wherein, C_lL during expression k clusters clusters, a_xIt is the sample during l clusters,It is l to cluster Center.

The span of the wherein said ratio value T set is as [1.4,1.6].

The invention have the benefit that

1, sample data abstraction module is set, randomly draws a few sample of bank client data, take out the most in the sample The probability getting isolated point is the lowest, is negligible, and improves the accuracy of cluster；

2, arranging neural metwork training module, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid As result is affected by each attribute, Clustering Effect more suits the actual demand of customer segmentation；

3, the clustering processing module arranged uses improvement k-means clustering method to gather each sample of sample data Class, is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, and solves original algorithm and is choosing k value and initialization Problem existing during cluster centre, improves cluster stability, further increases the precision of customer segmentation.

Accompanying drawing explanation

The invention will be further described to utilize accompanying drawing, but the embodiment in accompanying drawing does not constitute any limit to the present invention System, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to obtain according to the following drawings Other accompanying drawing.

Fig. 1 is the connection diagram of each module of the present invention；

Fig. 2 is the principle schematic of the present invention.

Reference:

Bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training mould Block 4, client's classification segmentation module 5.

Detailed description of the invention

The invention will be further described with the following Examples.

Embodiment 1

See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number According to carrying out storage to bank network data base；Described sample data abstraction module 2 is for the bank from bank network data base Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data；Described clustering processing module 3 changes for employing Enter k-means clustering method each sample of sample data is clustered, export cluster result；Described neural metwork training mould Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain To a neutral net trained；Bank client data are input to the nerve trained by described client's classification segmentation module 5 In network, bank client is finely divided.

Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client Family, little client, potential customers.

Wherein, described neutral net is the feed-forward type BP network more than 3 layers.

Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data Cluster, particularly as follows:

{XS}_{p} = Σ_{j = 1}^{n} s i m (a_{i}, a_{j}), p = 1, ..., n

J = Σ_{l = 1}^{k} \underset{a_{x} &Element; C_{l}}{Σ} | | a_{x} - \overset{&OverBar;}{a_{x l}} | |^{2}

The span of the wherein said ratio value T set is as [1.45,1.55].

The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster；Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation；The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.4, the precision 4.5% of customer segmentation.

Embodiment 2

{XS}_{p} = Σ_{j = 1}^{n} s i m (a_{i}, a_{j}), p = 1, ..., n

3) XS is arranged in descending order_p, p=1 ..., n, if XS_pIt is XS by front 4 values arranged from big to small_max,XS_max-1, XS_max-2,XS_max-3, ifSelect and maximum XS_maxCorresponding sample As first initial center that clusters, otherwise select and XS_max,XS_max-1,XS_max-2,XS_max-3Four corresponding samples equal Value is as first initial bunch center, and T is the ratio value set；

J = Σ_{l = 1}^{k} \underset{a_{x} &Element; C_{l}}{Σ} | | a_{x} - \overset{&OverBar;}{a_{x l}} | |^{2}

The span of the wherein said ratio value T set is as [1.45,1.55].

The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster；Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation；The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.45, the precision 4.2% of customer segmentation.

Embodiment 3

{XS}_{p} = Σ_{j = 1}^{n} s i m (a_{i}, a_{j}), p = 1, ..., n

J = Σ_{l = 1}^{k} \underset{a_{x} &Element; C_{l}}{Σ} | | a_{x} - \overset{&OverBar;}{a_{x l}} | |^{2}

Wherein, C₄L during expression k clusters clusters, a_xIt is the sample during l clusters,It is l to cluster Center.

The span of the wherein said ratio value T set is as [1.45,1.55].

The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster；Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation；The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.5, the precision 5% of customer segmentation.

Embodiment 4

{XS}_{p} = Σ_{j = 1}^{n} s i m (a_{i}, a_{j}), p = 1, ..., n

J = Σ_{l = 1}^{k} \underset{a_{x} &Element; C_{l}}{Σ} | | a_{x} - \overset{&OverBar;}{a_{x l}} | |^{2}

The span of the wherein said ratio value T set is as [1.45,1.55].

The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster；Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation；The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.55, the precision 4.7% of customer segmentation.

Embodiment 5

{XS}_{p} = Σ_{j = 1}^{n} s i m (a_{i}, a_{j}), p = 1, ..., n

J = Σ_{l = 1}^{k} \underset{a_{x} &Element; C_{l}}{Σ} | | a_{x} - \overset{&OverBar;}{a_{x l}} | |^{2}

The span of the wherein said ratio value T set is as [1.45,1.55].

The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster；Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation；The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.6, the precision 3.5% of customer segmentation.

Last it should be noted that, above example is only in order to illustrate technical scheme, rather than the present invention is protected Protecting the restriction of scope, although having made to explain to the present invention with reference to preferred embodiment, those of ordinary skill in the art should Work as understanding, technical scheme can be modified or equivalent, without deviating from the reality of technical solution of the present invention Matter and scope.

Claims

1. one kind based on improving k-means and the customer segmentation system of neural network clustering, it is characterised in that includes bank client Data acquisition module, sample data abstraction module, clustering processing module, neural metwork training module, client's classification segmentation module, Described bank client data acquisition module is used for gathering bank client data, and bank client data carries out storage to bank's net Network data base；Described sample data abstraction module is stochastic sampling in the bank client data from bank network data base, The data of extraction 1/3rd are as sample data；Described clustering processing module is used for using improvement k-means clustering method pair Each sample of sample data clusters, and exports cluster result；Described neural metwork training module is for by described cluster knot Fruit, as training sample, uses neural computing to go out the weights of each layer of each attribute, and obtains a nerve trained Network；Described client's classification segmentation module is for being input to bank client data in the neutral net trained, to bank visitor Family is finely divided.

The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special Levying and be, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, common customer, little visitor to bank client Family, potential customers.

The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special Levying and be, described neutral net is the feed-forward type BP network more than 3 layers.

The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special Levying and be, described clustering processing module uses improvement k-means clustering method to cluster each sample of sample data, tool Body is:

1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function all Sample similarity between any two, obtains similarity matrix XS；

2) each row of similarity matrix XS is sued for peace, calculate each sample similar to whole valid data collection Degree, if XS=is [sim (a_i,a_j)]_n×n, i, j=1 ..., n, wherein sim (a_i,a_j) represent sample a_i,a_jBetween similarity, summation Formula is:

{XS}_{p} = Σ_{j = 1}^{n} s i m (a_{i}, a_{j}), p = 1, ..., n

3) XS is arranged in descending order_p, p=1 ..., n, if XS_pIt is XS by front 4 values arranged from big to small_max,XS_max-1,XS_max-2, XS_max-3If,Select and maximum XS_maxCorresponding sample is as first The individual initial center that clusters, otherwise selects and XS_max,XS_max-1,XS_max-2,XS_max-3The average of four corresponding samples is as the One initial bunch center, T is the ratio value set；

4) it is XS by maximum_maxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum unit Element is XS_pq, q=1 ..., k-1, k-1 minimum element XS before selecting_pqCorresponding sample is initial as remaining k-1 The center that clusters, the establishing method of wherein said k value is: set k value may the interval of value, by testing the different values of k, And each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus visit The type information of cluster, and finally determine suitable k value；

5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity the highest In clustering, form the k after change and cluster；

6) average of each sample in clustering after calculating change, replaces clustering before updating as the center that clusters after updating Center；

7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, stops Updating, described object function is:

J = Σ_{l = 1}^{k} \underset{a_{x} &Element; C_{l}}{Σ} | | a_{x} - \overset{&OverBar;}{a_{x l}} | |^{2}

Wherein, C_lL during expression k clusters clusters, a_xIt is the sample during l clusters,It is during l clusters The heart.

5. according to claim 4 state a kind of based on improving k-means and the customer segmentation system of neural network clustering, its feature Being, the span of the described ratio value T set is as [1.45,1.55].