CN106204267A - A kind of based on improving k means and the customer segmentation system of neural network clustering - Google Patents
A kind of based on improving k means and the customer segmentation system of neural network clustering Download PDFInfo
- Publication number
- CN106204267A CN106204267A CN201610544043.5A CN201610544043A CN106204267A CN 106204267 A CN106204267 A CN 106204267A CN 201610544043 A CN201610544043 A CN 201610544043A CN 106204267 A CN106204267 A CN 106204267A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- max
- bank
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Strategic Management (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Evolutionary Computation (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of based on improving k means and the customer segmentation system of neural network clustering, including: bank client data acquisition module, it is used for gathering bank client data, and bank client data is carried out storage to bank network data base;Sample data abstraction module, stochastic sampling in the bank client data from bank network data base, the data of extraction 1/3rd are as sample data;Clustering processing module, for using improvement k means clustering method to cluster each sample of sample data, exports cluster result;Neural metwork training module, for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain a neutral net trained;Client's classification segmentation module, for bank client data being input in the neutral net trained, is finely divided bank client.The present invention reduces the probability being drawn into isolated point in the sample, improve the accuracy of cluster, customer segmentation precision is high.
Description
Technical field
The present invention relates to Data Mining, be specifically related to a kind of based on improving k-means and the visitor of neural network clustering
Family subdivision system.
Background technology
At present, dissimilar client is fairly obvious to the value variance that bank is brought, and bank can be by identification, district
Divide this species diversity, instruct it more reasonably to configure market sale, service and manage resource, obtain bigger receipts with less input
Benefit, solves this problem and is accomplished by carrying out customer segmentation.Bank client segmentation refer to bank clear and definite strategy, business model and
In specialized market, according to factors such as attribute, behavior, demand, preference and the value of client, client is classified, and carry
For for the process of product, service and marketing model.
In correlation technique, bank client segmentation there is experience sorting technique and based on statistical analysis method.The bank of empirical method
Customer segmentation is the most original division methods, typically according to oneself experience, client is carried out category division by policymaker, has very
Strong subjectivity, the result of segmentation is the most objective, lacks cogency.Customer segmentation based on statistical method is a kind of quantitative research,
Carrying out client's category division according to client properties characteristic statistics result, the result of segmentation often has extremely strong with criteria for classification
Relatedness, if criteria for classification is unreasonable, the result of classification is the most unreasonable.Constantly deep along with banks of China informatization
Entering, bank have accumulated substantial amounts of case history transaction data and customer data, simultaneously along with the development of network, it will accumulation
Increasing customer data, in the face of the customer data of magnanimity, in correlation technique customer segmentation method more will seem that power is not
From the heart.In recent years.Data mining technology has obtained rapid development, and it is many that it has merged data base, artificial intelligence and statistics etc.
Individual art, it is possible to from substantial amounts of, incomplete, noisy, fuzzy initial data, excavate useful, credible, new
The information of grain husk and the process of knowledge, wherein K-means cluster is a kind of most important data digging method, and it is thin at bank client
It is widely used in Fen.But existing K-means clustering method can not be prevented effectively from and single take stochastic sampling side
The occasionality that method is brought, cluster stability is low, and has the fatal defects sensitive to isolated point.
Summary of the invention
For the problems referred to above, the present invention provides a kind of based on improving k-means and the customer segmentation system of neural network clustering
System.
The purpose of the present invention realizes by the following technical solutions:
A kind of based on improving k-means and the customer segmentation system of neural network clustering, including bank client data acquisition
Module, sample data abstraction module, clustering processing module, neural metwork training module, client's classification segmentation module, described bank
Customer data acquisition module is used for gathering bank client data, and bank client data carries out storage to bank network data
Storehouse;Described sample data abstraction module is stochastic sampling in the bank client data from bank network data base, extracts three
The data of/mono-are as sample data;Described clustering processing module is used for using improvement k-means clustering method to sample number
According to each sample cluster, export cluster result;Described neural metwork training module for using described cluster result as
Training sample, uses neural computing to go out the weights of each layer of each attribute, and obtains a neutral net trained;Institute
State client's classification segmentation module for bank client data are input in the neutral net trained, bank client is carried out carefully
Point.
Preferably, described customer segmentation system is subdivided into five classes, i.e. premium customers to bank client, big customer, typically
Client, little client, potential customers.
Preferably, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module uses improvement k-means clustering method to carry out each sample of sample data
Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function
All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection
Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask
With formula it is:
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1,
XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made
It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples
As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum
Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1
The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k
Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus
Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity
High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating
Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima,
Stopping updating, described object function is:
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster
Center.
The span of the wherein said ratio value T set is as [1.4,1.6].
The invention have the benefit that
1, sample data abstraction module is set, randomly draws a few sample of bank client data, take out the most in the sample
The probability getting isolated point is the lowest, is negligible, and improves the accuracy of cluster;
2, arranging neural metwork training module, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid
As result is affected by each attribute, Clustering Effect more suits the actual demand of customer segmentation;
3, the clustering processing module arranged uses improvement k-means clustering method to gather each sample of sample data
Class, is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, and solves original algorithm and is choosing k value and initialization
Problem existing during cluster centre, improves cluster stability, further increases the precision of customer segmentation.
Accompanying drawing explanation
The invention will be further described to utilize accompanying drawing, but the embodiment in accompanying drawing does not constitute any limit to the present invention
System, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to obtain according to the following drawings
Other accompanying drawing.
Fig. 1 is the connection diagram of each module of the present invention;
Fig. 2 is the principle schematic of the present invention.
Reference:
Bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training mould
Block 4, client's classification segmentation module 5.
Detailed description of the invention
The invention will be further described with the following Examples.
Embodiment 1
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag
Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor
Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number
According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base
Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing
Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould
Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain
To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5
In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client
Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data
Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function
All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection
Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask
With formula it is:
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1,
XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made
It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples
As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum
Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1
The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k
Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus
Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity
High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating
Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima,
Stopping updating, described object function is:
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster
Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample
In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set
Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly-
Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample
Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original
The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor
The precision of family segmentation, wherein ratio value T=1.4, the precision 4.5% of customer segmentation.
Embodiment 2
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag
Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor
Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number
According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base
Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing
Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould
Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain
To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5
In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client
Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data
Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function
All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection
Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask
With formula it is:
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1,
XSmax-2,XSmax-3, ifSelect and maximum XSmaxCorresponding sample
As first initial center that clusters, otherwise select and XSmax,XSmax-1,XSmax-2,XSmax-3Four corresponding samples equal
Value is as first initial bunch center, and T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum
Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1
The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k
Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus
Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity
High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating
Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima,
Stopping updating, described object function is:
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster
Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample
In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set
Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly-
Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample
Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original
The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor
The precision of family segmentation, wherein ratio value T=1.45, the precision 4.2% of customer segmentation.
Embodiment 3
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag
Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor
Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number
According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base
Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing
Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould
Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain
To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5
In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client
Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data
Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function
All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection
Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask
With formula it is:
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1,
XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made
It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples
As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum
Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1
The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k
Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus
Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity
High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating
Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima,
Stopping updating, described object function is:
Wherein, C4L during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster
Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample
In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set
Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly-
Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample
Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original
The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor
The precision of family segmentation, wherein ratio value T=1.5, the precision 5% of customer segmentation.
Embodiment 4
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag
Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor
Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number
According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base
Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing
Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould
Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain
To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5
In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client
Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data
Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function
All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection
Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask
With formula it is:
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1,
XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made
It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples
As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum
Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1
The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k
Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus
Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity
High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating
Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima,
Stopping updating, described object function is:
Wherein, C4L during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster
Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample
In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set
Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly-
Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample
Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original
The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor
The precision of family segmentation, wherein ratio value T=1.55, the precision 4.7% of customer segmentation.
Embodiment 5
See Fig. 1, Fig. 2, the present embodiment based on improving k-means and the customer segmentation system of neural network clustering, bag
Include bank client data acquisition module 1, sample data abstraction module 2, clustering processing module 3, neural metwork training module 4, visitor
Family classification segmentation module 5, described bank client data acquisition module 1 is used for gathering bank client data, and by bank client number
According to carrying out storage to bank network data base;Described sample data abstraction module 2 is for the bank from bank network data base
Stochastic sampling in customer data, the data of extraction 1/3rd are as sample data;Described clustering processing module 3 changes for employing
Enter k-means clustering method each sample of sample data is clustered, export cluster result;Described neural metwork training mould
Block 4 is for going out the weights of each layer of each attribute using described cluster result as training sample, employing neural computing, and obtain
To a neutral net trained;Bank client data are input to the nerve trained by described client's classification segmentation module 5
In network, bank client is finely divided.
Wherein, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, typically visitor to bank client
Family, little client, potential customers.
Wherein, described neutral net is the feed-forward type BP network more than 3 layers.
Wherein, described clustering processing module 3 uses improvement k-means clustering method to carry out each sample of sample data
Cluster, particularly as follows:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function
All samples similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate the phase of each sample and whole valid data collection
Like degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, ask
With formula it is:
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1,
XSmax-2,XSmax-3If,Select and maximum XSmaxCorresponding sample is made
It is first initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples
As first initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum
Element be XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is as remaining k-1
The initial center that clusters, the establishing method of wherein said k value is: set the interval of the possible value of k value, by testing the difference of k
Value, and each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus
Visit the type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity
High cluster, form the k after change and cluster;
6) average of each sample in clustering after calculating change, before replacing updating as the center that clusters after updating
Cluster center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima,
Stopping updating, described object function is:
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is l to cluster
Center.
The span of the wherein said ratio value T set is as [1.45,1.55].
The present invention arranges sample data abstraction module 2, randomly draws a few sample of bank client data, so at sample
In to be drawn into the probability of isolated point the lowest, be negligible, improve the accuracy of cluster;Neural metwork training mould is set
Block 4, the feed-forward type BP network calculations more than 3 layers goes out the weights of each attribute, it is to avoid as result is affected by each attribute, poly-
Class effect more suits the actual demand of customer segmentation;The clustering processing module 3 arranged uses improves k-means clustering method to sample
Each sample of notebook data clusters, and is prevented effectively from the single occasionality taking arbitrary sampling method to be brought, solves original
The problem that algorithm is existing when choosing k value and initializing cluster centre, improves cluster stability, further increases visitor
The precision of family segmentation, wherein ratio value T=1.6, the precision 3.5% of customer segmentation.
Last it should be noted that, above example is only in order to illustrate technical scheme, rather than the present invention is protected
Protecting the restriction of scope, although having made to explain to the present invention with reference to preferred embodiment, those of ordinary skill in the art should
Work as understanding, technical scheme can be modified or equivalent, without deviating from the reality of technical solution of the present invention
Matter and scope.
Claims (5)
1. one kind based on improving k-means and the customer segmentation system of neural network clustering, it is characterised in that includes bank client
Data acquisition module, sample data abstraction module, clustering processing module, neural metwork training module, client's classification segmentation module,
Described bank client data acquisition module is used for gathering bank client data, and bank client data carries out storage to bank's net
Network data base;Described sample data abstraction module is stochastic sampling in the bank client data from bank network data base,
The data of extraction 1/3rd are as sample data;Described clustering processing module is used for using improvement k-means clustering method pair
Each sample of sample data clusters, and exports cluster result;Described neural metwork training module is for by described cluster knot
Fruit, as training sample, uses neural computing to go out the weights of each layer of each attribute, and obtains a nerve trained
Network;Described client's classification segmentation module is for being input to bank client data in the neutral net trained, to bank visitor
Family is finely divided.
The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special
Levying and be, described customer segmentation system is subdivided into five classes, i.e. premium customers, big customer, common customer, little visitor to bank client
Family, potential customers.
The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special
Levying and be, described neutral net is the feed-forward type BP network more than 3 layers.
The most according to claim 1 a kind of based on improving k-means and the customer segmentation system of neural network clustering, it is special
Levying and be, described clustering processing module uses improvement k-means clustering method to cluster each sample of sample data, tool
Body is:
1) set described sample data and there is n sample, n sample is carried out vectorization, calculated by included angle cosine function all
Sample similarity between any two, obtains similarity matrix XS;
2) each row of similarity matrix XS is sued for peace, calculate each sample similar to whole valid data collection
Degree, if XS=is [sim (ai,aj)]n×n, i, j=1 ..., n, wherein sim (ai,aj) represent sample ai,ajBetween similarity, summation
Formula is:
3) XS is arranged in descending orderp, p=1 ..., n, if XSpIt is XS by front 4 values arranged from big to smallmax,XSmax-1,XSmax-2,
XSmax-3If,Select and maximum XSmaxCorresponding sample is as first
The individual initial center that clusters, otherwise selects and XSmax,XSmax-1,XSmax-2,XSmax-3The average of four corresponding samples is as the
One initial bunch center, T is the ratio value set;
4) it is XS by maximummaxIn corresponding matrix, the element of row vector carries out ascending order arrangement, it is assumed that front k-1 minimum unit
Element is XSpq, q=1 ..., k-1, k-1 minimum element XS before selectingpqCorresponding sample is initial as remaining k-1
The center that clusters, the establishing method of wherein said k value is: set k value may the interval of value, by testing the different values of k,
And each value in interval is clustered, by comparing covariance, determine the significant difference between cluster, thus visit
The type information of cluster, and finally determine suitable k value;
5) calculate residue sample and each initial similarity clustered between center, residue sample is distributed to similarity the highest
In clustering, form the k after change and cluster;
6) average of each sample in clustering after calculating change, replaces clustering before updating as the center that clusters after updating
Center;
7) if the center that clusters before Geng Xining is identical with the center that clusters after renewal, or object function has reached minima, stops
Updating, described object function is:
Wherein, ClL during expression k clusters clusters, axIt is the sample during l clusters,It is during l clusters
The heart.
5. according to claim 4 state a kind of based on improving k-means and the customer segmentation system of neural network clustering, its feature
Being, the span of the described ratio value T set is as [1.45,1.55].
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610544043.5A CN106204267A (en) | 2016-07-06 | 2016-07-06 | A kind of based on improving k means and the customer segmentation system of neural network clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610544043.5A CN106204267A (en) | 2016-07-06 | 2016-07-06 | A kind of based on improving k means and the customer segmentation system of neural network clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106204267A true CN106204267A (en) | 2016-12-07 |
Family
ID=57476857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610544043.5A Withdrawn CN106204267A (en) | 2016-07-06 | 2016-07-06 | A kind of based on improving k means and the customer segmentation system of neural network clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106204267A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944553A (en) * | 2017-11-22 | 2018-04-20 | 浙江大华技术股份有限公司 | A kind of method for trimming and device of CNN models |
CN112767124A (en) * | 2021-01-15 | 2021-05-07 | 上海琢学科技有限公司 | Method and device for improving retention rate of personal delivery service |
CN113159881A (en) * | 2021-03-15 | 2021-07-23 | 杭州云搜网络技术有限公司 | Data clustering and B2B platform customer preference obtaining method and system |
US11651229B2 (en) | 2017-11-22 | 2023-05-16 | Zhejiang Dahua Technology Co., Ltd. | Methods and systems for face recognition |
US11900230B2 (en) | 2019-07-17 | 2024-02-13 | Visa International Service Association | Method, system, and computer program product for identifying subpopulations |
-
2016
- 2016-07-06 CN CN201610544043.5A patent/CN106204267A/en not_active Withdrawn
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944553A (en) * | 2017-11-22 | 2018-04-20 | 浙江大华技术股份有限公司 | A kind of method for trimming and device of CNN models |
US11651229B2 (en) | 2017-11-22 | 2023-05-16 | Zhejiang Dahua Technology Co., Ltd. | Methods and systems for face recognition |
US11900230B2 (en) | 2019-07-17 | 2024-02-13 | Visa International Service Association | Method, system, and computer program product for identifying subpopulations |
CN112767124A (en) * | 2021-01-15 | 2021-05-07 | 上海琢学科技有限公司 | Method and device for improving retention rate of personal delivery service |
CN113159881A (en) * | 2021-03-15 | 2021-07-23 | 杭州云搜网络技术有限公司 | Data clustering and B2B platform customer preference obtaining method and system |
CN113159881B (en) * | 2021-03-15 | 2022-08-12 | 杭州云搜网络技术有限公司 | Data clustering and B2B platform customer preference obtaining method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Customer churn prediction using improved balanced random forests | |
CN106204267A (en) | A kind of based on improving k means and the customer segmentation system of neural network clustering | |
CN108960833B (en) | Abnormal transaction identification method, equipment and storage medium based on heterogeneous financial characteristics | |
Adams et al. | Data mining for fun and profit | |
Peacock | Data mining in marketing: Part 1 | |
Zhang et al. | The Bayesian additive classification tree applied to credit risk modelling | |
Rahman et al. | Link prediction in dynamic networks using graphlet | |
EP3226175A1 (en) | Image pattern recognition device and program | |
Li et al. | Empirical research of hybridizing principal component analysis with multivariate discriminant analysis and logistic regression for business failure prediction | |
Ali et al. | K-means clustering algorithm applications in data mining and pattern recognition | |
CN104850868A (en) | Customer segmentation method based on k-means and neural network cluster | |
Rustogi et al. | Swift imbalance data classification using SMOTE and extreme learning machine | |
CN109739844A (en) | Data classification method based on decaying weight | |
CN111325248A (en) | Method and system for reducing pre-loan business risk | |
Cai et al. | Large-scale global and simultaneous inference: Estimation and testing in very high dimensions | |
Radhakrishnan et al. | Application of data mining in marketing | |
CN107729377A (en) | Customer classification method and system based on data mining | |
CN111861756A (en) | Group partner detection method based on financial transaction network and implementation device thereof | |
CN108664653A (en) | A kind of Medical Consumption client's automatic classification method based on K-means | |
CN108197795A (en) | The account recognition methods of malice group, device, terminal and storage medium | |
CN107274066A (en) | A kind of shared traffic Customer Value Analysis method based on LRFMD models | |
CN102722578A (en) | Unsupervised cluster characteristic selection method based on Laplace regularization | |
CN109934286A (en) | Bug based on Text character extraction and uneven processing strategie reports severity recognition methods | |
Gončarovs | Using data analytics for customers segmentation: Experimental study at a financial institution | |
CN111581298A (en) | Heterogeneous data integration system and method for large data warehouse |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C04 | Withdrawal of patent application after publication (patent law 2001) | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20161207 |