CN104850868A - Customer segmentation method based on k-means and neural network cluster - Google Patents
Customer segmentation method based on k-means and neural network cluster Download PDFInfo
- Publication number
- CN104850868A CN104850868A CN201510323644.9A CN201510323644A CN104850868A CN 104850868 A CN104850868 A CN 104850868A CN 201510323644 A CN201510323644 A CN 201510323644A CN 104850868 A CN104850868 A CN 104850868A
- Authority
- CN
- China
- Prior art keywords
- neural network
- data
- customer segmentation
- segmentation method
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a customer segmentation method based on k-means and neural network cluster, and the method comprises the following steps: (1) carrying out random sampling from general data, wherein selected data serves as sample data; (2) carrying out k-means clustering of the sample data selected at step (1), and calculating the class of each data sample; (3) taking clustering results obtained at step (2) as a training sample, calculating a weight value of each layer of each property through a neural network, and obtaining a trained neural network; (4) inputting the general data into the trained neural network, and calculating the class of the general data. According to the invention, only a few of data samples are selected, and the probability that an isolated point is selected is very low. Moreover, the weight value of each property is calculated through a BP neural network, and the impact on the results from each property is avoided, thereby ironing out the defects of a conventional K-means clustering algorithm, and enabling a clustering effect to accord with the actual demands of customer segmentation.
Description
Technical field
The present invention relates to Data Mining, particularly a kind of customer segmentation method based on k-means and neural network clustering.
Background technology
After China's accession to the WTO, in the face of foreign bank enter the in-depth with financial reform, financial competition is more fierce, and top-tier customer becomes the focus of competition among banks gradually.Dissimilar client is fairly obvious to the value variance that bank brings, bank is by identifying, distinguishing this species diversity, instruct it more reasonably to configure market sale, service and management resource, obtain larger income with less input, address this problem and just need to carry out customer segmentation.Bank client segmentation refers to that bank is in clear and definite strategy, business model and specialized market, according to the attribute of client, behavior, demand, the factor such as preference and value, client is classified, and provide for product, service and marketing model process.
At present, experience sorting technique and Corpus--based Method analytic approach is had to bank client segmentation traditionally.The bank client segmentation of empirical method is the most original division methods, and generally carry out category division according to oneself experience to client by decision maker, have very strong subjectivity, the result of segmentation is not objective, lacks cogency.The customer segmentation of Corpus--based Method method is a kind of quantitative research, client's category division is carried out according to client properties characteristic statistics result, the result of segmentation often has extremely strong relevance with criteria for classification, if criteria for classification is unreasonable, the result of classification is also unreasonable.Along with deepening continuously of banks of China informatization, bank have accumulated a large amount of case history transaction data and customer data, simultaneously along with the development of network, increasing customer data will be accumulated, in the face of the customer data of magnanimity, traditional customer segmentation method more will seem unable to do what one wishes.In recent years.Data mining technology obtains and develops rapidly, it has merged multiple art such as database, artificial intelligence and statistics, can from a large amount of, incomplete, noisy, fuzzy raw data, excavate useful, credible, novel information and the process of knowledge, wherein K-means cluster is a kind of most important data digging method, and it is widely used in bank client segmentation.
K-means algorithm is based on disintegrating method classical clustering algorithm in data mining technology, because it is theoretical reliable, algorithm is simple, fast convergence rate and being widely used.The thought that K-means algorithm adopts iteration to upgrade, first the representative cluster selecting randomly K object initial or bunch center, it is assigned to again nearest bunch to remaining each object according to the distance at itself and each bunch center again, the center then recalculating each bunch is as the cluster centre of next iteration.Constantly repeat this process, until stop when each cluster centre no longer changes.Iteration makes the cluster centre chosen more and more close to real bunch of center, so Clustering Effect is become better and better, finally all objects is divided into K bunch.
The concrete steps of traditional K-means algorithm:
Input: cluster trees K and the data set X={x1 comprising N number of object, x2, x3, x4 ... xn}.
Export: K clustering cluster { s1, s2, s3 ... sk}, makes objective function minimum.
Concrete steps:
(1) from data set X a Stochastic choice K object as initial cluster center c1, c2, c3 ..., ck;
(2) one by one by object xi (i=1,2,3 ..., n) distribute to a nearest cluster centre cj according to Euclidean distance, 1≤j≤K;
(3) cluster centre cj new in each bunch is recalculated,
(4) until K cluster centre no longer changes, criterion function is restrained.
Fig. 1 is the basic flow sheet of traditional K-means algorithm.
K-means algorithm is the classic algorithm solving clustering problem, and this algorithm is simple and quick.But traditional K-means algorithm has the fatal defects to isolated point sensitivity, if data centralization exists isolated point, so the Clustering Effect of k-means algorithm is not just very desirable.And in cluster process, each attribute is put on an equal footing, so just can not distinguish the impact of different attributes on cluster result.
Summary of the invention
The object of the invention is to overcome above-mentioned deficiency existing in prior art, a kind of customer segmentation method based on k-means and neural network clustering is provided, just a few sample is extracted in the first step, the probability being drawn into isolated point so is in the sample just very low, negligible, and adopt BP neural computing to go out the weights of each attribute, avoid each attribute the same to Influence on test result.
In order to realize foregoing invention object, the invention provides following technical scheme:
Based on a customer segmentation method for k-means and neural network clustering, comprise the following steps:
(1) random sampling from conceptual data, extracting part divided data is as sample data;
(2) k-means cluster is carried out to the sample data that step (1) extracts, calculate the classification belonging to each sample data;
(3) using the cluster result of step (2) as training sample, adopt neural computing to go out the weights of the every one deck of each attribute, and obtain a neural network trained;
(4) conceptual data is input in the neural network trained, calculates the classification belonging to it.
Preferably, described neural network is BP neural network.
Preferably, described BP neural network is 3 layers or is greater than the feed-forward type BP network of 3 layers.
Preferably, described partial data is not more than 30% of conceptual data.
Preferably, described partial data is not more than 15% of conceptual data.
Preferably, described partial data is not more than 5% of conceptual data.
Preferably, the clusters number of described k-means cluster is 5.
Preferably, conceptual data advanced line number Data preprocess before random sampling in described step (1).
Preferably, described data prediction adopts centralization and standardized transformation method.
Compared with prior art, beneficial effect of the present invention:
1. method of the present invention just randomly draws a few sample in conceptual data in the first step, and the probability being drawn into isolated point so is in the sample just very low, negligible, improves the accuracy of cluster;
2. method of the present invention adopts BP neural computing to go out the weights of each attribute, and avoid each attribute the same to Influence on test result, Clustering Effect more suits the actual demand of customer segmentation.
Accompanying drawing explanation
Fig. 1 is the basic flow sheet of traditional K-means algorithm
Fig. 2 is the particular flow sheet of customer segmentation method of the present invention
Embodiment
Below in conjunction with test example and embodiment, the present invention is described in further detail.But this should be interpreted as that the scope of the above-mentioned theme of the present invention is only limitted to following embodiment, all technology realized based on content of the present invention all belong to scope of the present invention.
Customer segmentation method specific implementation step based on k-means and neural network clustering of the present invention is as follows:
(1) random sampling from conceptual data, extracts small part data as sample;
(2) k-means cluster is carried out to the sample data that the first step extracts, calculate the classification belonging to each sample;
(3) using the cluster result of second step as training sample, adopt BP neural computing to go out the weights of the every one deck of each attribute, and obtain a BP neural network trained.
(4) conceptual data is input in the BP neural network that the 3rd step trains, calculates the class belonging to it.
As the conceptual data of this specific embodiment from the client segmentation data in the personal teaching efficacy system of certain bank of city domestic.Input has 2000 bank client samples, and the attribute field that every bar record comprises has: customer number, age, length of service, client's monthly pay, this cash in banks number, bank's access times, debt-credit situation and house situation, altogether eight fields.Bank client client being exported is 5 large class, i.e. premium customers, big customer, common customer, little client, potential customers.
Method of the present invention to the particular flow sheet of customer segmentation as shown in Figure 2, first carries out data prediction to raw data.May because artificial deviation in source data collection process, database is containing there being data that are imperfect, Noise, and each field references different characteristic simultaneously recorded in database, often uses different linear modules, and its value difference is very greatly different.Therefore, necessary pre-service is carried out to improve the quality of data to raw data, thus make data mining process more effectively, classify more accurate.The data prediction of the inventive method adopts centralization and standardized transformation method.Centralization object has identical basic point in order to each field value, and concrete execution is carried out according to the following formula:
Wherein, x
ijrepresent the value in i-th jth field recorded, n represents the sum of record,
represent all be recorded in j attribute field and, x '
ijrepresent the value in a jth field of i-th after centralization record.
The basis of centralization is converted it by standardization again, and make the transformation range of each field unified, adopt the standardization of zero-average, it standardizes according to field mean value and standard deviation, and concrete execution is carried out according to the following formula:
Wherein, x
jrepresent the average of all values be recorded in j attribute field, (x '
ij) ' represent after the standardization of zero passage one average i-th record a jth field in 0-1 within the scope of value.
After data prediction process, each field basic point is identical, variation range too, its standard deviation is 0, and average is 1.
After data prediction, adopt the clustering algorithm based on k-means and neural network of the present invention to classify to client, detailed process is as follows:
(1) produce 1000 user loggings carry out data prediction from 2000 customer datas after, random sampling 300 clients, as next step sample;
(2) k-means cluster is carried out to client's sample data that the first step extracts, mark off k cluster set, and the classification belonging to each sample is marked;
(3) using the training sample of the cluster result of second step as artificial neural network, adopt BP neural computing to go out the weights of the every one deck of each attribute, and obtain a BP neural network trained.
(4) data of all clients are input in the neural network that the 3rd step trains, calculate the class label belonging to it.
By contrasting the customer segmentation method based on k-means and neural network clustering of the present invention and traditional k-means algorithm, input clusters number K=5, its comparing result is as shown in the table:
The customer segmentation Comparative result of the traditional K-means algorithm of table 1 and the inventive method
As can be seen from the classification results that upper table obtains, customer segmentation method based on k-means and neural network clustering of the present invention, overcome traditional K-means clustering algorithm to isolated point responsive and in cluster process to the shortcoming that each attribute is put on an equal footing.Because the first step just randomly draws low volume data, so the probability being drawn into isolated point is very low.As can be seen from the result of upper table, the Clustering Effect after algorithm improvement more suits the actual demand of bank client segmentation, how to utilize existing mass data to carry out customer segmentation problem to provide a new approach for data mining solves banking system.
Claims (9)
1., based on a customer segmentation method for k-means and neural network clustering, it is characterized in that, comprise the following steps:
(1) random sampling from conceptual data, extracting part divided data is as sample data;
(2) k-means cluster is carried out to the sample data that step (1) extracts, calculate the classification belonging to each sample data;
(3) using the cluster result of step (2) as training sample, adopt neural computing to go out the weights of the every one deck of each attribute, and obtain a neural network trained;
(4) conceptual data is input in the neural network trained, calculates the classification belonging to it.
2. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, described neural network is BP neural network.
3. the customer segmentation method based on k-means and neural network clustering according to claim 2, is characterized in that, described BP neural network is 3 layers or is greater than the feed-forward type BP network of 3 layers.
4. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, described partial data is not more than 30% of conceptual data.
5. the customer segmentation method based on k-means and neural network clustering according to claim 4, it is characterized in that, it is characterized in that, described partial data is not more than 15% of conceptual data.
6. the customer segmentation method based on k-means and neural network clustering according to claim 5, it is characterized in that, it is characterized in that, described partial data is not more than 5% of conceptual data.
7. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, the clusters number of described k-means cluster is 5.
8. the customer segmentation method based on k-means and neural network clustering according to claim 1, is characterized in that, conceptual data advanced line number Data preprocess before random sampling in described step (1).
9. the customer segmentation method based on k-means and neural network clustering according to claim 8, is characterized in that, described data prediction adopts centralization and standardized transformation method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510323644.9A CN104850868A (en) | 2015-06-12 | 2015-06-12 | Customer segmentation method based on k-means and neural network cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510323644.9A CN104850868A (en) | 2015-06-12 | 2015-06-12 | Customer segmentation method based on k-means and neural network cluster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104850868A true CN104850868A (en) | 2015-08-19 |
Family
ID=53850503
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510323644.9A Pending CN104850868A (en) | 2015-06-12 | 2015-06-12 | Customer segmentation method based on k-means and neural network cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104850868A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354208A (en) * | 2015-09-21 | 2016-02-24 | 江苏讯狐信息科技有限公司 | Big data information mining method |
CN105844334A (en) * | 2016-03-22 | 2016-08-10 | 南京信息工程大学 | Radial basis function neural network-based temperature interpolation algorithm |
CN106651546A (en) * | 2017-01-03 | 2017-05-10 | 重庆邮电大学 | Intelligent community oriented electronic commerce information recommendation method |
WO2017143932A1 (en) * | 2016-02-26 | 2017-08-31 | 中国银联股份有限公司 | Fraudulent transaction detection method based on sample clustering |
CN107274066A (en) * | 2017-05-19 | 2017-10-20 | 浙江大学 | A kind of shared traffic Customer Value Analysis method based on LRFMD models |
CN107633035A (en) * | 2017-09-08 | 2018-01-26 | 浙江大学 | A kind of shared transport services reorder predictor methods based on K Means&LightGBM models |
US11900230B2 (en) | 2019-07-17 | 2024-02-13 | Visa International Service Association | Method, system, and computer program product for identifying subpopulations |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100595780C (en) * | 2007-12-13 | 2010-03-24 | 中国科学院合肥物质科学研究院 | Handwriting digital automatic identification method based on module neural network SN9701 rectangular array |
CN103926526A (en) * | 2014-05-05 | 2014-07-16 | 重庆大学 | Analog circuit fault diagnosis method based on improved RBF neural network |
CN104156403A (en) * | 2014-07-24 | 2014-11-19 | 中国软件与技术服务股份有限公司 | Clustering-based big data normal-mode extracting method and system |
CN106935035A (en) * | 2017-04-07 | 2017-07-07 | 西安电子科技大学 | Parking offense vehicle real-time detection method based on SSD neutral nets |
-
2015
- 2015-06-12 CN CN201510323644.9A patent/CN104850868A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100595780C (en) * | 2007-12-13 | 2010-03-24 | 中国科学院合肥物质科学研究院 | Handwriting digital automatic identification method based on module neural network SN9701 rectangular array |
CN103926526A (en) * | 2014-05-05 | 2014-07-16 | 重庆大学 | Analog circuit fault diagnosis method based on improved RBF neural network |
CN104156403A (en) * | 2014-07-24 | 2014-11-19 | 中国软件与技术服务股份有限公司 | Clustering-based big data normal-mode extracting method and system |
CN106935035A (en) * | 2017-04-07 | 2017-07-07 | 西安电子科技大学 | Parking offense vehicle real-time detection method based on SSD neutral nets |
Non-Patent Citations (1)
Title |
---|
周培毅等: "《基于遗传算法与BP神经网的风力发电机齿轮箱故障诊断研究》", 《华北电力技术》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354208A (en) * | 2015-09-21 | 2016-02-24 | 江苏讯狐信息科技有限公司 | Big data information mining method |
WO2017143932A1 (en) * | 2016-02-26 | 2017-08-31 | 中国银联股份有限公司 | Fraudulent transaction detection method based on sample clustering |
CN105844334A (en) * | 2016-03-22 | 2016-08-10 | 南京信息工程大学 | Radial basis function neural network-based temperature interpolation algorithm |
CN105844334B (en) * | 2016-03-22 | 2018-03-27 | 南京信息工程大学 | A kind of temperature interpolation method based on radial base neural net |
CN106651546A (en) * | 2017-01-03 | 2017-05-10 | 重庆邮电大学 | Intelligent community oriented electronic commerce information recommendation method |
CN106651546B (en) * | 2017-01-03 | 2021-12-07 | 重庆邮电大学 | Electronic commerce information recommendation method oriented to smart community |
CN107274066A (en) * | 2017-05-19 | 2017-10-20 | 浙江大学 | A kind of shared traffic Customer Value Analysis method based on LRFMD models |
CN107633035A (en) * | 2017-09-08 | 2018-01-26 | 浙江大学 | A kind of shared transport services reorder predictor methods based on K Means&LightGBM models |
CN107633035B (en) * | 2017-09-08 | 2020-04-14 | 浙江大学 | Shared traffic service reorder estimation method based on K-Means and LightGBM model |
US11900230B2 (en) | 2019-07-17 | 2024-02-13 | Visa International Service Association | Method, system, and computer program product for identifying subpopulations |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109255506B (en) | Internet financial user loan overdue prediction method based on big data | |
WO2021088499A1 (en) | False invoice issuing identification method and system based on dynamic network representation | |
Marqués et al. | On the suitability of resampling techniques for the class imbalance problem in credit scoring | |
CN104850868A (en) | Customer segmentation method based on k-means and neural network cluster | |
CN103632168B (en) | Classifier integration method for machine learning | |
CN111160401B (en) | Abnormal electricity utilization discriminating method based on mean shift and XGBoost | |
WO2017143919A1 (en) | Method and apparatus for establishing data identification model | |
CN107194803A (en) | P2P net loan borrower credit risk assessment device | |
CN105373606A (en) | Unbalanced data sampling method in improved C4.5 decision tree algorithm | |
Silva et al. | Cross country relations in European tourist arrivals | |
CN111325248A (en) | Method and system for reducing pre-loan business risk | |
CN111695597A (en) | Credit fraud group recognition method and system based on improved isolated forest algorithm | |
CN112001788A (en) | Credit card default fraud identification method based on RF-DBSCAN algorithm | |
CN105426441B (en) | A kind of automatic preprocess method of time series | |
Fan et al. | Improved ML‐based technique for credit card scoring in Internet financial risk control | |
Xu et al. | Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN110930038A (en) | Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium | |
CN110377605A (en) | A kind of Sensitive Attributes identification of structural data and classification stage division | |
CN113256409A (en) | Bank retail customer attrition prediction method based on machine learning | |
CN106204267A (en) | A kind of based on improving k means and the customer segmentation system of neural network clustering | |
CN107729377A (en) | Customer classification method and system based on data mining | |
CN113240527A (en) | Bond market default risk early warning method based on interpretable machine learning | |
CN112183652A (en) | Edge end bias detection method under federated machine learning environment | |
CN108960282A (en) | A kind of online service measures of reputation method based on semi-supervised learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150819 |
|
RJ01 | Rejection of invention patent application after publication |