CN104850868A - Customer segmentation method based on k-means and neural network cluster - Google Patents

Customer segmentation method based on k-means and neural network cluster Download PDF

Info

Publication number
CN104850868A
CN104850868A CN201510323644.9A CN201510323644A CN104850868A CN 104850868 A CN104850868 A CN 104850868A CN 201510323644 A CN201510323644 A CN 201510323644A CN 104850868 A CN104850868 A CN 104850868A
Authority
CN
China
Prior art keywords
neural network
data
customer segmentation
segmentation method
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510323644.9A
Other languages
Chinese (zh)
Inventor
刘念
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan You Lian Information Technology Co Ltd
Original Assignee
Sichuan You Lian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan You Lian Information Technology Co Ltd filed Critical Sichuan You Lian Information Technology Co Ltd
Priority to CN201510323644.9A priority Critical patent/CN104850868A/en
Publication of CN104850868A publication Critical patent/CN104850868A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a customer segmentation method based on k-means and neural network cluster, and the method comprises the following steps: (1) carrying out random sampling from general data, wherein selected data serves as sample data; (2) carrying out k-means clustering of the sample data selected at step (1), and calculating the class of each data sample; (3) taking clustering results obtained at step (2) as a training sample, calculating a weight value of each layer of each property through a neural network, and obtaining a trained neural network; (4) inputting the general data into the trained neural network, and calculating the class of the general data. According to the invention, only a few of data samples are selected, and the probability that an isolated point is selected is very low. Moreover, the weight value of each property is calculated through a BP neural network, and the impact on the results from each property is avoided, thereby ironing out the defects of a conventional K-means clustering algorithm, and enabling a clustering effect to accord with the actual demands of customer segmentation.

Description

A kind of customer segmentation method based on k-means and neural network clustering
Technical field
The present invention relates to Data Mining, particularly a kind of customer segmentation method based on k-means and neural network clustering.
Background technology
After China's accession to the WTO, in the face of foreign bank enter the in-depth with financial reform, financial competition is more fierce, and top-tier customer becomes the focus of competition among banks gradually.Dissimilar client is fairly obvious to the value variance that bank brings, bank is by identifying, distinguishing this species diversity, instruct it more reasonably to configure market sale, service and management resource, obtain larger income with less input, address this problem and just need to carry out customer segmentation.Bank client segmentation refers to that bank is in clear and definite strategy, business model and specialized market, according to the attribute of client, behavior, demand, the factor such as preference and value, client is classified, and provide for product, service and marketing model process.
At present, experience sorting technique and Corpus--based Method analytic approach is had to bank client segmentation traditionally.The bank client segmentation of empirical method is the most original division methods, and generally carry out category division according to oneself experience to client by decision maker, have very strong subjectivity, the result of segmentation is not objective, lacks cogency.The customer segmentation of Corpus--based Method method is a kind of quantitative research, client's category division is carried out according to client properties characteristic statistics result, the result of segmentation often has extremely strong relevance with criteria for classification, if criteria for classification is unreasonable, the result of classification is also unreasonable.Along with deepening continuously of banks of China informatization, bank have accumulated a large amount of case history transaction data and customer data, simultaneously along with the development of network, increasing customer data will be accumulated, in the face of the customer data of magnanimity, traditional customer segmentation method more will seem unable to do what one wishes.In recent years.Data mining technology obtains and develops rapidly, it has merged multiple art such as database, artificial intelligence and statistics, can from a large amount of, incomplete, noisy, fuzzy raw data, excavate useful, credible, novel information and the process of knowledge, wherein K-means cluster is a kind of most important data digging method, and it is widely used in bank client segmentation.
K-means algorithm is based on disintegrating method classical clustering algorithm in data mining technology, because it is theoretical reliable, algorithm is simple, fast convergence rate and being widely used.The thought that K-means algorithm adopts iteration to upgrade, first the representative cluster selecting randomly K object initial or bunch center, it is assigned to again nearest bunch to remaining each object according to the distance at itself and each bunch center again, the center then recalculating each bunch is as the cluster centre of next iteration.Constantly repeat this process, until stop when each cluster centre no longer changes.Iteration makes the cluster centre chosen more and more close to real bunch of center, so Clustering Effect is become better and better, finally all objects is divided into K bunch.
The concrete steps of traditional K-means algorithm:
Input: cluster trees K and the data set X={x1 comprising N number of object, x2, x3, x4 ... xn}.
Export: K clustering cluster { s1, s2, s3 ... sk}, makes objective function minimum.
Concrete steps:
(1) from data set X a Stochastic choice K object as initial cluster center c1, c2, c3 ..., ck;
(2) one by one by object xi (i=1,2,3 ..., n) distribute to a nearest cluster centre cj according to Euclidean distance, 1≤j≤K;
(3) cluster centre cj new in each bunch is recalculated,
(4) until K cluster centre no longer changes, criterion function is restrained.
Fig. 1 is the basic flow sheet of traditional K-means algorithm.
K-means algorithm is the classic algorithm solving clustering problem, and this algorithm is simple and quick.But traditional K-means algorithm has the fatal defects to isolated point sensitivity, if data centralization exists isolated point, so the Clustering Effect of k-means algorithm is not just very desirable.And in cluster process, each attribute is put on an equal footing, so just can not distinguish the impact of different attributes on cluster result.
Summary of the invention
The object of the invention is to overcome above-mentioned deficiency existing in prior art, a kind of customer segmentation method based on k-means and neural network clustering is provided, just a few sample is extracted in the first step, the probability being drawn into isolated point so is in the sample just very low, negligible, and adopt BP neural computing to go out the weights of each attribute, avoid each attribute the same to Influence on test result.
In order to realize foregoing invention object, the invention provides following technical scheme:
Based on a customer segmentation method for k-means and neural network clustering, comprise the following steps:
(1) random sampling from conceptual data, extracting part divided data is as sample data;
(2) k-means cluster is carried out to the sample data that step (1) extracts, calculate the classification belonging to each sample data;
(3) using the cluster result of step (2) as training sample, adopt neural computing to go out the weights of the every one deck of each attribute, and obtain a neural network trained;
(4) conceptual data is input in the neural network trained, calculates the classification belonging to it.
Preferably, described neural network is BP neural network.
Preferably, described BP neural network is 3 layers or is greater than the feed-forward type BP network of 3 layers.
Preferably, described partial data is not more than 30% of conceptual data.
Preferably, described partial data is not more than 15% of conceptual data.
Preferably, described partial data is not more than 5% of conceptual data.
Preferably, the clusters number of described k-means cluster is 5.
Preferably, conceptual data advanced line number Data preprocess before random sampling in described step (1).
Preferably, described data prediction adopts centralization and standardized transformation method.
Compared with prior art, beneficial effect of the present invention:
1. method of the present invention just randomly draws a few sample in conceptual data in the first step, and the probability being drawn into isolated point so is in the sample just very low, negligible, improves the accuracy of cluster;
2. method of the present invention adopts BP neural computing to go out the weights of each attribute, and avoid each attribute the same to Influence on test result, Clustering Effect more suits the actual demand of customer segmentation.
Accompanying drawing explanation
Fig. 1 is the basic flow sheet of traditional K-means algorithm
Fig. 2 is the particular flow sheet of customer segmentation method of the present invention
Embodiment
Below in conjunction with test example and embodiment, the present invention is described in further detail.But this should be interpreted as that the scope of the above-mentioned theme of the present invention is only limitted to following embodiment, all technology realized based on content of the present invention all belong to scope of the present invention.
Customer segmentation method specific implementation step based on k-means and neural network clustering of the present invention is as follows:
(1) random sampling from conceptual data, extracts small part data as sample;
(2) k-means cluster is carried out to the sample data that the first step extracts, calculate the classification belonging to each sample;
(3) using the cluster result of second step as training sample, adopt BP neural computing to go out the weights of the every one deck of each attribute, and obtain a BP neural network trained.
(4) conceptual data is input in the BP neural network that the 3rd step trains, calculates the class belonging to it.
As the conceptual data of this specific embodiment from the client segmentation data in the personal teaching efficacy system of certain bank of city domestic.Input has 2000 bank client samples, and the attribute field that every bar record comprises has: customer number, age, length of service, client's monthly pay, this cash in banks number, bank's access times, debt-credit situation and house situation, altogether eight fields.Bank client client being exported is 5 large class, i.e. premium customers, big customer, common customer, little client, potential customers.
Method of the present invention to the particular flow sheet of customer segmentation as shown in Figure 2, first carries out data prediction to raw data.May because artificial deviation in source data collection process, database is containing there being data that are imperfect, Noise, and each field references different characteristic simultaneously recorded in database, often uses different linear modules, and its value difference is very greatly different.Therefore, necessary pre-service is carried out to improve the quality of data to raw data, thus make data mining process more effectively, classify more accurate.The data prediction of the inventive method adopts centralization and standardized transformation method.Centralization object has identical basic point in order to each field value, and concrete execution is carried out according to the following formula:
x ij ′ = x ij - Σ i = 1 n x ij / n
Wherein, x ijrepresent the value in i-th jth field recorded, n represents the sum of record, represent all be recorded in j attribute field and, x ' ijrepresent the value in a jth field of i-th after centralization record.
The basis of centralization is converted it by standardization again, and make the transformation range of each field unified, adopt the standardization of zero-average, it standardizes according to field mean value and standard deviation, and concrete execution is carried out according to the following formula:
( x ij ′ ) ′ = ( n - 1 ) x ij ′ / Σ i = 1 n ( x ij - x j ) 2
Wherein, x jrepresent the average of all values be recorded in j attribute field, (x ' ij) ' represent after the standardization of zero passage one average i-th record a jth field in 0-1 within the scope of value.
After data prediction process, each field basic point is identical, variation range too, its standard deviation is 0, and average is 1.
After data prediction, adopt the clustering algorithm based on k-means and neural network of the present invention to classify to client, detailed process is as follows:
(1) produce 1000 user loggings carry out data prediction from 2000 customer datas after, random sampling 300 clients, as next step sample;
(2) k-means cluster is carried out to client's sample data that the first step extracts, mark off k cluster set, and the classification belonging to each sample is marked;
(3) using the training sample of the cluster result of second step as artificial neural network, adopt BP neural computing to go out the weights of the every one deck of each attribute, and obtain a BP neural network trained.
(4) data of all clients are input in the neural network that the 3rd step trains, calculate the class label belonging to it.
By contrasting the customer segmentation method based on k-means and neural network clustering of the present invention and traditional k-means algorithm, input clusters number K=5, its comparing result is as shown in the table:
The customer segmentation Comparative result of the traditional K-means algorithm of table 1 and the inventive method
As can be seen from the classification results that upper table obtains, customer segmentation method based on k-means and neural network clustering of the present invention, overcome traditional K-means clustering algorithm to isolated point responsive and in cluster process to the shortcoming that each attribute is put on an equal footing.Because the first step just randomly draws low volume data, so the probability being drawn into isolated point is very low.As can be seen from the result of upper table, the Clustering Effect after algorithm improvement more suits the actual demand of bank client segmentation, how to utilize existing mass data to carry out customer segmentation problem to provide a new approach for data mining solves banking system.

Claims (9)

1., based on a customer segmentation method for k-means and neural network clustering, it is characterized in that, comprise the following steps:
(1) random sampling from conceptual data, extracting part divided data is as sample data;
(2) k-means cluster is carried out to the sample data that step (1) extracts, calculate the classification belonging to each sample data;
(3) using the cluster result of step (2) as training sample, adopt neural computing to go out the weights of the every one deck of each attribute, and obtain a neural network trained;
(4) conceptual data is input in the neural network trained, calculates the classification belonging to it.
2. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, described neural network is BP neural network.
3. the customer segmentation method based on k-means and neural network clustering according to claim 2, is characterized in that, described BP neural network is 3 layers or is greater than the feed-forward type BP network of 3 layers.
4. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, described partial data is not more than 30% of conceptual data.
5. the customer segmentation method based on k-means and neural network clustering according to claim 4, it is characterized in that, it is characterized in that, described partial data is not more than 15% of conceptual data.
6. the customer segmentation method based on k-means and neural network clustering according to claim 5, it is characterized in that, it is characterized in that, described partial data is not more than 5% of conceptual data.
7. the customer segmentation method based on k-means and neural network clustering according to claim 1, it is characterized in that, the clusters number of described k-means cluster is 5.
8. the customer segmentation method based on k-means and neural network clustering according to claim 1, is characterized in that, conceptual data advanced line number Data preprocess before random sampling in described step (1).
9. the customer segmentation method based on k-means and neural network clustering according to claim 8, is characterized in that, described data prediction adopts centralization and standardized transformation method.
CN201510323644.9A 2015-06-12 2015-06-12 Customer segmentation method based on k-means and neural network cluster Pending CN104850868A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510323644.9A CN104850868A (en) 2015-06-12 2015-06-12 Customer segmentation method based on k-means and neural network cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510323644.9A CN104850868A (en) 2015-06-12 2015-06-12 Customer segmentation method based on k-means and neural network cluster

Publications (1)

Publication Number Publication Date
CN104850868A true CN104850868A (en) 2015-08-19

Family

ID=53850503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510323644.9A Pending CN104850868A (en) 2015-06-12 2015-06-12 Customer segmentation method based on k-means and neural network cluster

Country Status (1)

Country Link
CN (1) CN104850868A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354208A (en) * 2015-09-21 2016-02-24 江苏讯狐信息科技有限公司 Big data information mining method
CN105844334A (en) * 2016-03-22 2016-08-10 南京信息工程大学 Radial basis function neural network-based temperature interpolation algorithm
CN106651546A (en) * 2017-01-03 2017-05-10 重庆邮电大学 Intelligent community oriented electronic commerce information recommendation method
WO2017143932A1 (en) * 2016-02-26 2017-08-31 中国银联股份有限公司 Fraudulent transaction detection method based on sample clustering
CN107274066A (en) * 2017-05-19 2017-10-20 浙江大学 A kind of shared traffic Customer Value Analysis method based on LRFMD models
CN107633035A (en) * 2017-09-08 2018-01-26 浙江大学 A kind of shared transport services reorder predictor methods based on K Means&LightGBM models
US11900230B2 (en) 2019-07-17 2024-02-13 Visa International Service Association Method, system, and computer program product for identifying subpopulations

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100595780C (en) * 2007-12-13 2010-03-24 中国科学院合肥物质科学研究院 Handwriting digital automatic identification method based on module neural network SN9701 rectangular array
CN103926526A (en) * 2014-05-05 2014-07-16 重庆大学 Analog circuit fault diagnosis method based on improved RBF neural network
CN104156403A (en) * 2014-07-24 2014-11-19 中国软件与技术服务股份有限公司 Clustering-based big data normal-mode extracting method and system
CN106935035A (en) * 2017-04-07 2017-07-07 西安电子科技大学 Parking offense vehicle real-time detection method based on SSD neutral nets

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100595780C (en) * 2007-12-13 2010-03-24 中国科学院合肥物质科学研究院 Handwriting digital automatic identification method based on module neural network SN9701 rectangular array
CN103926526A (en) * 2014-05-05 2014-07-16 重庆大学 Analog circuit fault diagnosis method based on improved RBF neural network
CN104156403A (en) * 2014-07-24 2014-11-19 中国软件与技术服务股份有限公司 Clustering-based big data normal-mode extracting method and system
CN106935035A (en) * 2017-04-07 2017-07-07 西安电子科技大学 Parking offense vehicle real-time detection method based on SSD neutral nets

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周培毅等: "《基于遗传算法与BP神经网的风力发电机齿轮箱故障诊断研究》", 《华北电力技术》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354208A (en) * 2015-09-21 2016-02-24 江苏讯狐信息科技有限公司 Big data information mining method
WO2017143932A1 (en) * 2016-02-26 2017-08-31 中国银联股份有限公司 Fraudulent transaction detection method based on sample clustering
CN105844334A (en) * 2016-03-22 2016-08-10 南京信息工程大学 Radial basis function neural network-based temperature interpolation algorithm
CN105844334B (en) * 2016-03-22 2018-03-27 南京信息工程大学 A kind of temperature interpolation method based on radial base neural net
CN106651546A (en) * 2017-01-03 2017-05-10 重庆邮电大学 Intelligent community oriented electronic commerce information recommendation method
CN106651546B (en) * 2017-01-03 2021-12-07 重庆邮电大学 Electronic commerce information recommendation method oriented to smart community
CN107274066A (en) * 2017-05-19 2017-10-20 浙江大学 A kind of shared traffic Customer Value Analysis method based on LRFMD models
CN107633035A (en) * 2017-09-08 2018-01-26 浙江大学 A kind of shared transport services reorder predictor methods based on K Means&LightGBM models
CN107633035B (en) * 2017-09-08 2020-04-14 浙江大学 Shared traffic service reorder estimation method based on K-Means and LightGBM model
US11900230B2 (en) 2019-07-17 2024-02-13 Visa International Service Association Method, system, and computer program product for identifying subpopulations

Similar Documents

Publication Publication Date Title
CN109255506B (en) Internet financial user loan overdue prediction method based on big data
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN103632168B (en) Classifier integration method for machine learning
CN111311402A (en) XGboost-based internet financial wind control model
WO2017143919A1 (en) Method and apparatus for establishing data identification model
WO2021088499A1 (en) False invoice issuing identification method and system based on dynamic network representation
CN107194803A (en) A kind of P2P nets borrow the device of borrower's assessing credit risks
CN105373606A (en) Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN109034194A (en) Transaction swindling behavior depth detection method based on feature differentiation
CN109492026A (en) A kind of Telecoms Fraud classification and Detection method based on improved active learning techniques
Silva et al. Cross country relations in European tourist arrivals
CN111754345A (en) Bit currency address classification method based on improved random forest
Kirkos et al. Identifying qualified auditors' opinions: a data mining approach
CN105426441B (en) A kind of automatic preprocess method of time series
CN111325248A (en) Method and system for reducing pre-loan business risk
CN111695597A (en) Credit fraud group recognition method and system based on improved isolated forest algorithm
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN110377605A (en) A kind of Sensitive Attributes identification of structural data and classification stage division
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN110689437A (en) Communication construction project financial risk prediction method based on random forest
CN107729377A (en) Customer classification method and system based on data mining
CN112183652A (en) Edge end bias detection method under federated machine learning environment
CN106204267A (en) A kind of based on improving k means and the customer segmentation system of neural network clustering
Chu et al. Co-training based on semi-supervised ensemble classification approach for multi-label data stream
Glennon et al. Development and validation of credit scoring models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150819

RJ01 Rejection of invention patent application after publication