CN106204267A - A kind of based on improving k means and the customer segmentation system of neural network clustering - Google Patents

A kind of based on improving k means and the customer segmentation system of neural network clustering Download PDF

Info

Publication number
CN106204267A
CN106204267A CN201610544043.5A CN201610544043A CN106204267A CN 106204267 A CN106204267 A CN 106204267A CN 201610544043 A CN201610544043 A CN 201610544043A CN 106204267 A CN106204267 A CN 106204267A
Authority
CN
China
Prior art keywords
sample
data
max
bank
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201610544043.5A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201610544043.5A priority Critical patent/CN106204267A/en
Publication of CN106204267A publication Critical patent/CN106204267A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of based on improving k means and the customer segmentation system of neural network clustering, including: bank client data acquisition module, it is used for gathering bank client data, and bank client data is carried out storage to bank network data base;Sample data abstraction module, stochastic sampling in the bank client data from bank network data base, the data of extraction 1/3rd are as sample data;Clustering processing module, for using improvement k means clustering method to cluster each sample of sample data, exports cluster result;Neural metwork training module, for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain a neutral net trained;Client's classification segmentation module, for bank client data being input in the neutral net trained, is finely divided bank client.The present invention reduces the probability being drawn into isolated point in the sample, improve the accuracy of cluster, customer segmentation precision is high.

Description

A kind of based on improving k-means and the customer segmentation system of neural network clustering
Technical field
The present invention relates to Data Mining, be specifically related to a kind of based on improving k-means and the visitor of neural network clustering Family subdivision system.
Background technology
At present, dissimilar client is fairly obvious to the value variance that bank is brought, and bank can be by identification, district Divide this species diversity, instruct it more reasonably to configure market sale, service and manage resource, obtain bigger receipts with less input Benefit, solves this problem and is accomplished by carrying out customer segmentation.Bank client segmentation refer to bank clear and definite strategy, business model and In specialized market, according to factors such as attribute, behavior, demand, preference and the value of client, client is classified, and carry For for the process of product, service and marketing model.
In correlation technique, bank client segmentation there is experience sorting technique and based on statistical analysis method.The bank of empirical method Customer segmentation is the most original division methods, typically according to oneself experience, client is carried out category division by policymaker, has very Strong subjectivity, the result of segmentation is the most objective, lacks cogency.Customer segmentation based on statistical method is a kind of quantitative research, Carrying out client's category division according to client properties characteristic statistics result, the result of segmentation often has extremely strong with criteria for classification Relatedness, if criteria for classification is unreasonable, the result of classification is the most unreasonable.Constantly deep along with banks of China informatization Entering, bank have accumulated substantial amounts of case history transaction data and customer data, simultaneously along with the development of network, it will accumulation Increasing customer data, in the face of the customer data of magnanimity, in correlation technique customer segmentation method more will seem that power is not From the heart.In recent years.Data mining technology has obtained rapid development, and it is many that it has merged data base, artificial intelligence and statistics etc. Individual art, it is possible to from substantial amounts of, incomplete, noisy, fuzzy initial data, excavate useful, credible, new The information of grain husk and the process of knowledge, wherein K-means cluster is a kind of most important data digging method, and it is thin at bank client It is widely used in Fen.But existing K-means clustering method can not be prevented effectively from and single take stochastic sampling side The occasionality that method is brought, cluster stability is low, and has the fatal defects sensitive to isolated point.
Summary of the invention
For the problems referred to above, the present invention provides a kind of based on improving k-means and the customer segmentation system of neural network clustering System.
The purpose of the present invention realizes by the following technical solutions:
A kind of based on improving k-means and the customer segmentation system of neural network clustering, including bank client data acquisition Module, sample data abstraction module, clustering processing module, neural metwork training module, client's classification segmentation module, described bank Customer data acquisition module is used for gathering bank client data, and bank client data carries out storage to bank network data Storehouse;Described sample data abstraction module is stochastic sampling in the bank client data from bank network data base, extracts three The data of/mono-are as sample data;Described clustering processing module is used for using improvement k-means clustering method to sample number According to each sample cluster, export cluster result;Described neural metwork training module for using described cluster result as Training sample, uses neural computing to go out the weights of each layer of each attribute, and obtains a neutral net trained;Institute State client's classification segmentation module for bank client data are input in the neutral net trained, bank client is carried out carefully Point.
Preferably, described customer segmentation system is subdivided into five classes, i.e. premium customers to bank client, big customer, typically Client, little client, potential customers.
Preferably, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module uses improvement k-means clustering method to carry out each sample of sample data Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask With formula it is:
XS p = Σ j = 1 n s i m ( a i , a j ) , p = 1 , ... , n
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1, XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1 The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, Stopping updating, described object function is:
J = Σ l = 1 k Σ a x ∈ C l | | a x - a x l ‾ | | 2
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster Center.
The span of the wherein said ratio value T set is as [1.4,1.6].
The invention have the benefit that
1, sample data abstraction module is set, randomly draws a few sample of bank client data, take out the most in the sample The probability getting isolated point is the lowest, is negligible, and improves the accuracy of cluster;
2, arranging neural metwork training module, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid As result is affected by each attribute, Clustering Effect more suits the actual demand of customer segmentation;
3, the clustering processing module arranged uses improvement k-means clustering method to gather each sample of sample data Class, is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, and solves original algorithm and is choosing k value and initialization Problem existing during cluster centre, improves cluster stability, further increases the precision of customer segmentation.
Accompanying drawing explanation
The invention will be further described to utilize accompanying drawing, but the embodiment in accompanying drawing does not constitute any limit to the present invention System, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to obtain according to the following drawings Other accompanying drawing.
Fig. 1 is the connection diagram of each module of the present invention;
Fig. 2 is the principle schematic of the present invention.
Reference:
Bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training mould Block 4, client's classification segmentation module 5.
Detailed description of the invention
The invention will be further described with the following Examples.
Embodiment 1
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5 In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask With formula it is:
XS p = Σ j = 1 n s i m ( a i , a j ) , p = 1 , ... , n
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1, XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1 The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, Stopping updating, described object function is:
J = Σ l = 1 k Σ a x ∈ C l | | a x - a x l ‾ | | 2
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.4, the precision 4.5% of customer segmentation.
Embodiment 2
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5 In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask With formula it is:
XS p = Σ j = 1 n s i m ( a i , a j ) , p = 1 , ... , n
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1, XSmax-2,XSmax-3, ifSelect and maximum XSmaxCorresponding sample As first initial center that clusters, otherwise select and XSmax,XSmax-1,XSmax-2,XSmax-3Four corresponding samples equal Value is as first initial bunch center, and T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1 The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, Stopping updating, described object function is:
J = Σ l = 1 k Σ a x ∈ C l | | a x - a x l ‾ | | 2
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.45, the precision 4.2% of customer segmentation.
Embodiment 3
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5 In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask With formula it is:
XS p = Σ j = 1 n s i m ( a i , a j ) , p = 1 , ... , n
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1, XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1 The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, Stopping updating, described object function is:
J = Σ l = 1 k Σ a x ∈ C l | | a x - a x l ‾ | | 2
Wherein, C4L during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.5, the precision 5% of customer segmentation.
Embodiment 4
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5 In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask With formula it is:
XS p = Σ j = 1 n s i m ( a i , a j ) , p = 1 , ... , n
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1, XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1 The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, Stopping updating, described object function is:
J = Σ l = 1 k Σ a x ∈ C l | | a x - a x l ‾ | | 2
Wherein, C4L during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.55, the precision 4.7% of customer segmentation.
Embodiment 5
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5 In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask With formula it is:
XS p = Σ j = 1 n s i m ( a i , a j ) , p = 1 , ... , n
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1, XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1 The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, Stopping updating, described object function is:
J = Σ l = 1 k Σ a x ∈ C l | | a x - a x l ‾ | | 2
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly- Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor The precision of family segmentation, wherein ratio value T=1.6, the precision 3.5% of customer segmentation.
Last it should be noted that, above example is only in order to illustrate technical scheme, rather than the present invention is protected Protecting the restriction of scope, although having made to explain to the present invention with reference to preferred embodiment, those of ordinary skill in the art should Work as understanding, technical scheme can be modified or equivalent, without deviating from the reality of technical solution of the present invention Matter and scope.

Claims (5)

1. one kind based on improving k-means and the customer segmentation system of neural network clustering, it is characterised in that includes bank client Data acquisition module, sample data abstraction module, clustering processing module, neural metwork training module, client's classification segmentation module, Described bank client data acquisition module is used for gathering bank client data, and bank client data carries out storage to bank's net Network data base;Described sample data abstraction module is stochastic sampling in the bank client data from bank network data base, The data of extraction 1/3rd are as sample data;Described clustering processing module is used for using improvement k-means clustering method pair Each sample of sample data clusters, and exports cluster result;Described neural metwork training module is for by described cluster knot Fruit, as training sample, uses neural computing to go out the weights of each layer of each attribute, and obtains a nerve trained Network;Described client's classification segmentation module is for being input to bank client data in the neutral net trained, to bank visitor Family is finely divided.
The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special Levying and be, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, common customer, little visitor to bank client Family, potential customers.
The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special Levying and be, described neutral net is the feed-forward type BP network more than 3 layers.
The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special Levying and be, described clustering processing module uses improvement k-means clustering method to cluster each sample of sample data, tool Body is:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function all Sample similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate each sample similar to whole valid data collection Degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, summation Formula is:
XS p = Σ j = 1 n s i m ( a i , a j ) , p = 1 , ... , n
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1,XSmax-2, XSmax-3If,Select and maximum XSmaxCorresponding sample is as first The individual initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples is as the One initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum unit Element is XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is initial as remaining k-1 The center that clusters, the establishing method of wherein said k value is: set k value may the interval of value, by testing the different values of k, And each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus visit The type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity the highest In clustering, form the k after change and cluster;
6) average of each sample in clustering after calculating change, replaces clustering before updating as the center that clusters after updating Center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, stops Updating, described object function is:
J = Σ l = 1 k Σ a x ∈ C l | | a x - a x l ‾ | | 2
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is during l clusters The heart.
5. according to claim 4 state a kind of based on improving k-means and the customer segmentation system of neural network clustering, its feature Being, the span of the described ratio value T set is as [1.45,1.55].
CN201610544043.5A 2016-07-06 2016-07-06 A kind of based on improving k means and the customer segmentation system of neural network clustering Withdrawn CN106204267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610544043.5A CN106204267A (en) 2016-07-06 2016-07-06 A kind of based on improving k means and the customer segmentation system of neural network clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610544043.5A CN106204267A (en) 2016-07-06 2016-07-06 A kind of based on improving k means and the customer segmentation system of neural network clustering

Publications (1)

Publication Number Publication Date
CN106204267A true CN106204267A (en) 2016-12-07

Family

ID=57476857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610544043.5A Withdrawn CN106204267A (en) 2016-07-06 2016-07-06 A kind of based on improving k means and the customer segmentation system of neural network clustering

Country Status (1)

Country Link
CN (1) CN106204267A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944553A (en) * 2017-11-22 2018-04-20 浙江大华技术股份有限公司 A kind of method for trimming and device of CNN models
CN112767124A (en) * 2021-01-15 2021-05-07 上海琢学科技有限公司 Method and device for improving retention rate of personal delivery service
CN113159881A (en) * 2021-03-15 2021-07-23 杭州云搜网络技术有限公司 Data clustering and B2B platform customer preference obtaining method and system
US11651229B2 (en) 2017-11-22 2023-05-16 Zhejiang Dahua Technology Co., Ltd. Methods and systems for face recognition
US11900230B2 (en) 2019-07-17 2024-02-13 Visa International Service Association Method, system, and computer program product for identifying subpopulations

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944553A (en) * 2017-11-22 2018-04-20 浙江大华技术股份有限公司 A kind of method for trimming and device of CNN models
US11651229B2 (en) 2017-11-22 2023-05-16 Zhejiang Dahua Technology Co., Ltd. Methods and systems for face recognition
US11900230B2 (en) 2019-07-17 2024-02-13 Visa International Service Association Method, system, and computer program product for identifying subpopulations
CN112767124A (en) * 2021-01-15 2021-05-07 上海琢学科技有限公司 Method and device for improving retention rate of personal delivery service
CN113159881A (en) * 2021-03-15 2021-07-23 杭州云搜网络技术有限公司 Data clustering and B2B platform customer preference obtaining method and system
CN113159881B (en) * 2021-03-15 2022-08-12 杭州云搜网络技术有限公司 Data clustering and B2B platform customer preference obtaining method and system

Similar Documents

Publication Publication Date Title
Xie et al. Customer churn prediction using improved balanced random forests
CN106204267A (en) A kind of based on improving k means and the customer segmentation system of neural network clustering
CN108960833B (en) Abnormal transaction identification method, equipment and storage medium based on heterogeneous financial characteristics
Adams et al. Data mining for fun and profit
Peacock Data mining in marketing: Part 1
Zhang et al. The Bayesian additive classification tree applied to credit risk modelling
Rahman et al. Link prediction in dynamic networks using graphlet
EP3226175A1 (en) Image pattern recognition device and program
Li et al. Empirical research of hybridizing principal component analysis with multivariate discriminant analysis and logistic regression for business failure prediction
Ali et al. K-means clustering algorithm applications in data mining and pattern recognition
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
Rustogi et al. Swift imbalance data classification using SMOTE and extreme learning machine
CN109739844A (en) Data classification method based on decaying weight
CN111325248A (en) Method and system for reducing pre-loan business risk
Cai et al. Large-scale global and simultaneous inference: Estimation and testing in very high dimensions
Radhakrishnan et al. Application of data mining in marketing
CN107729377A (en) Customer classification method and system based on data mining
CN111861756A (en) Group partner detection method based on financial transaction network and implementation device thereof
CN108664653A (en) A kind of Medical Consumption client's automatic classification method based on K-means
CN108197795A (en) The account recognition methods of malice group, device, terminal and storage medium
CN107274066A (en) A kind of shared traffic Customer Value Analysis method based on LRFMD models
CN102722578A (en) Unsupervised cluster characteristic selection method based on Laplace regularization
CN109934286A (en) Bug based on Text character extraction and uneven processing strategie reports severity recognition methods
Gončarovs Using data analytics for customers segmentation: Experimental study at a financial institution
CN111581298A (en) Heterogeneous data integration system and method for large data warehouse

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C04 Withdrawal of patent application after publication (patent law 2001)
WW01 Invention patent application withdrawn after publication

Application publication date: 20161207