CN106096052A - A kind of consumer's clustering method towards wechat marketing - Google Patents

A kind of consumer's clustering method towards wechat marketing Download PDF

Info

Publication number
CN106096052A
CN106096052A CN201610497893.4A CN201610497893A CN106096052A CN 106096052 A CN106096052 A CN 106096052A CN 201610497893 A CN201610497893 A CN 201610497893A CN 106096052 A CN106096052 A CN 106096052A
Authority
CN
China
Prior art keywords
cluster
data
consumer
individual
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610497893.4A
Other languages
Chinese (zh)
Inventor
高扬华
陆海良
单宇翔
郁钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tobacco Zhejiang Industrial Co Ltd
Original Assignee
China Tobacco Zhejiang Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tobacco Zhejiang Industrial Co Ltd filed Critical China Tobacco Zhejiang Industrial Co Ltd
Priority to CN201610497893.4A priority Critical patent/CN106096052A/en
Publication of CN106096052A publication Critical patent/CN106096052A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The present invention relates to social network data process field, particularly relate to a kind of consumer's clustering processing method towards wechat marketing, the method comprises the following steps: 1) to data complete or collected worksScan for, by data complete or collected works' randomizationSecondary;2) sample data set after each randomization is carried out k means algorithm cluster, it is thus achieved that one group of cluster centre,Sub-sampling, can obtain altogetherGroup cluster centre;3) utilize error sum of squares criterion function, search out one group of cluster centre of optimum, and export;4) Optimal cluster centers searched out with step 3) is as initial cluster center,For input parameter (), data complete or collected works are performed k means algorithm;5) in generationIn group cluster, nearest two groups of combined distance, recalculate the cluster centre after merging;Until clusters number is reduced to, stop merging;Whole algorithm terminates.The method increase speed and the stability of consumer data cluster process.

Description

A kind of consumer's clustering method marketed towards wechat
Technical field
At social network data process field, more particularly to a kind of consumer's cluster marketed towards wechat Reason method.
Background technology
K-means algorithms as one of the most frequently used Data Clustering Algorithm, its principle be it is pre-set need divide classification NumberAs input parameter, data set is divided intoIndividual cluster, according to each data object and each cluster cluster centre Euclidean distance judge which cluster this data object should be integrated into.In same cluster data object have each other compared with High similarity, and the data object similarity between different clusters is relatively low.K-means algorithms are comprised the concrete steps that:It is first depending on defeated Enter parameter, in data set random selectionIndividual data object respectively as each cluster cluster centre, calculate it is remaining each Euclidean distance of the data object to each cluster centre, to allCompare size, data object is referred to minimumInstitute is right In the cluster answered;Then the cluster centre of each cluster is recalculated, each data object is calculated again to each cluster centre Euclidean distance, foundationMinimum principle, reclassifies into correspondence cluster to data object, this process is repeated.Finally Untill the cluster centre of each cluster no longer changes or varied less, iterative calculation terminates, and exports lastIndividual cluster Cluster, the cluster of complete paired data collection.
Traditional k-means clustering algorithms procedure chart is as shown in Figure 1.
The shortcoming of traditional k-means clustering algorithms:
It is extremely sensitive to initial cluster center.Because k-means algorithms are random selection to the selection of initial cluster center, if Initial cluster center selection is improper, and algorithm is easily ensnared into locally optimal solution, rather than globally optimal solution.Particularly when data set point When cloth is uneven, marginal point, extreme point are possible to that initial point can be chosen as, and cause iterative convergence speed slow, Clustering Effect is not Substantially situations such as.
The content of the invention
In order to solve in the prior art, consumer data treatment effeciency is slow, and local optimum shape easily occurs in cluster process Condition is so that the problem of the leading to the failure present invention proposes a kind of consumer's clustering processing method marketed towards wechat, and this method is carried The high speed and stability of consumer data cluster process.
In order to solve the above technical problems, the present invention is achieved through the following technical solutions:
A kind of consumer's clustering processing method marketed towards wechat, the information of this method processing is the consumption gathered from wechat Person's information, including:Operation note information after the personal information that consumer actively makes a report on, consumer's concern wechat public platform, Buying behavior information and suggestion feedback information, this method comprise the following steps:
1)To data complete or collected worksScan for, by data complete or collected works' grab sampleIt is secondary;
2)K-means algorithm clusters are carried out to the sample data set after each grab sample, a group cluster center is obtained,It is secondary to take Sample, can be obtained altogetherGroup cluster center;
3)Using error sum of squares criterion function, an optimal group cluster center is searched out, and export;
4)With step 3)The Optimal cluster centers searched out are initial cluster center,For input parameter(), it is complete to data Collection performs k-means algorithms;
5)In generationIn group cluster, nearest two groups of combined distance recalculate the cluster centre after merging;Until cluster Number is reduced to, stop merging;Whole algorithm terminates.
The present invention as a result of above-mentioned technical scheme, compared with prior art it is an advantage of the invention that:
(1)Improve the speed and stability of consumer data cluster process;
(2)Because in cluster process, data subset is multiple(More than 3), Distributed Calculation that can be popular at present.
Brief description of the drawings
Fig. 1 is tradition k-means algorithm flow charts.
Fig. 2 is the k-means algorithm flow charts after present invention improvement.
Fig. 3 samples emulation data set D.
Fig. 4 is two kinds of algorithm performs result figures.
Embodiment
The present invention will be further described with reference to the accompanying drawings and examples:
Present invention design is adapted to the modified k-means algorithms of big data environment, and algorithm flow chart is as shown in Figure 2.At a certain section Between(One month)Customer data according to profit contribution value be attribute clustered.Had in each class clientIndividual data pair As, each data object hasThe individual property of value,Represent what customer data was concentrated TheThe of individual clientIndividual attribute.
The customer data set collected to some phase timeThe purpose clustered is obtained according to customer value tribute Degree of offering, is obtainedIndividual Customer clustering collection.
Algorithm basic step is described as follows:
To data acquisition systemCarry outSub-sampling, extracts the number of same client object every time, forms data acquisition system vector
Cluster number is set, and, it is rightIn each sampling set, k-means algorithms are performed, are obtainedGroupIt is individual Cluster centre
According to sampling setIn client's number in each cluster set,, calculate sampling setIn it is every The error sum of squares of individual cluster, calculation formula is as follows:
(1)
WhereinRefer to cluster set value theThe of individual clientIndividual property value,Refer to this cluster centreAttribute's Value.
Using error sum of squares criterion function, sampling set is calculatedError sum of squares, its calculation formula is such as Under:
(2)
SelectionThat group cluster center corresponding to minimum value, exported as initial cluster center.
WithAs initial cluster center,To cluster number, to data acquisition systemUniverse perform k-means calculate Method, is obtainedIndividual cluster
The middle distance for calculating each two cluster respectively(Euclidean distance between cluster centre) , its calculation formula is as follows:
(3)
Wherein,Refer toThe cluster centre attribute of individual clusterValue.ChooseTwo minimum Cluster mergings, lay equal stress on The new cluster centre calculated after merging, until cluster setMiddle clusters number is reduced toWhen, stop closing And, outputIndividual cluster set.Whole modified k-means algorithms terminate, and obtain according to customer value contribution degree index, logarithm According to setCluster.
The present invention will carry out emulation experiment with computer software, after contrasting tradition k-means algorithms and improving herein K-means algorithms Clustering Effect.Realized in this simulation experimental program using Visual C++, computer hardware configuration For CPU:Inter i5 processors 2.5GHz;Internal memory:4GB.Related data sample parameter sets as shown in table 1:
Table 1
Data sample size Data attribute is tieed up Initial clustering number Cluster numbers Sample drawn number of times
5000 2 12 4 20
The data decimation of this emulation experiment two dimensional sample emulation data set as shown in Figure 3, calculate respectively four in Fig. 4 The vector that refers both to of individual data subset is(0.6509,0.9582)、(3.4821,1.1241)、(3.9587,3.0213)、 (1.7424,4.2508).Tradition k-means algorithms are used first, to raw data setClustered, 30 tradition are performed altogether K-means algorithms.When performing tradition k-means algorithms every time, upset the order of input data.Similarly, to initial data CollectionIt is also to upset the order for reading in data when performing the k-means algorithms after 30 improvement.The purpose for the arrangement is that in order to examine The stability of checking method.
One group of representative cluster result, such as Fig. 4 are respectively selected in the implementing result to two kinds of algorithms(a)、(b)Institute Show, wherein dot red in figure is represented in each cluster, the position of cluster centre.Use the cluster of traditional k-means algorithms As a result similar Fig. 4 in(a)Occur in that altogether 23 times.Fig. 4(a)Situation about being reflected is the feelings for being more typically absorbed in Local Minimum Condition.And can stably obtain similar Fig. 4 after the modified k-means algorithms designed herein are performed(b)Shown cluster knot Really.
Followed by cluster result, the average value at 30 group cluster centers calculates the cluster centre produced with passing through Comparative analysis, illustrates the quality of two kinds of algorithms, as shown in table 2:
The cluster centre value of table 2 compares
Cluster numbering Traditional k-means algorithms produce the average value of cluster centre Modified k-means algorithms produce the average value of cluster centre Calculate the cluster centre value produced
1 {0.5964,1.1820} {0.6256,0.9021} {0.6509,0.9582}
2 {3.0425,0.8852} {3.3584,1.3024} {3.4821,1.1241}
3 {3.3218,3.3204} {4.1021,3.1541} {3.9587,3.0213}
4 {1.6822,4.3652} {1.7455,4.2057} {1.7424,4.2508}
Pass through the comparison between cluster centre average value in table 2, it can be clearly seen that the cluster that modified k-means algorithms are drawn Central value is more nearly with the cluster centre value produced after calculating.With reference to Fig. 4(a)、(b), we, which can do, further divides Analysis:Due to initial data skewness, the shape size of each cluster is also inconsistent, and traditional k-means algorithms random selection is just The way of beginning cluster centre, it is easy to choose number of edges strong point as initial cluster center.Once such case is run into, tradition The cluster centre that k-means algorithms are finally drawn very likely is absorbed in locally optimal solution, Fig. 4(a)Reflection is exactly this kind of phenomenon Typical case.Modified k-means algorithms, 20 equivalent mean samples are first carried out to initial data(Every group of sampling samples are included 250 elements), using error sum of squares criterion function, preferably go out can most reflect data complete or collected works in this 20 groups of sampling samples Shape, the cluster centre of density feature substitute into computing in k-means algorithms as initial cluster center, and set initial clustering number Mesh, last agglomerative clustering result.So doing can make cluster result unrelated with reading in data order, be avoided that again poly- Class is isolated the situation generation to form local excellent solution.
The consumer's clustering method proposed by the present invention marketed towards wechat, stability is stronger, and the degree of accuracy is higher, especially fits Close the big big data source of processing skewness, data volume.
The specific embodiment of the present invention is the foregoing is only, but the technical characteristic of the present invention is not limited thereto, Ren Heben The technical staff in field is in the field of the invention, and the change or modification made all are covered among the scope of the claims of the present invention.

Claims (2)

1. a kind of consumer's clustering processing method marketed towards wechat, it is characterised in that the information of this method processing is from micro- Believe the consumer information of collection, including:Behaviour after the personal information that consumer actively makes a report on, consumer's concern wechat public platform Note down information, buying behavior information and suggestion feedback information, this method comprises the following steps:
1)To data complete or collected worksScan for, by data complete or collected works' grab sampleIt is secondary;
2)K-means algorithm clusters are carried out to the sample data set after each grab sample, a group cluster center is obtained,It is secondary to take Sample, can be obtained altogetherGroup cluster center;
3)Using error sum of squares criterion function, an optimal group cluster center is searched out, and export;
4)With step 3)The Optimal cluster centers searched out are initial cluster center,For input parameter(), it is complete to data Collection performs k-means algorithms;
5)In generationIn group cluster, nearest two groups of combined distance recalculate the cluster centre after merging;Until cluster Number is reduced to, stop merging;Whole algorithm terminates.
2. a kind of consumer's clustering processing method marketed towards wechat, it is characterised in that the information of this method processing is from micro- Believe the consumer information of collection, including:Behaviour after the personal information that consumer actively makes a report on, consumer's concern wechat public platform Note down information, buying behavior information and suggestion feedback information;Had in each class clientIndividual data object, each data object hasThe individual property of value,Represent customer data is concentrated theThe of individual clientIndividual attribute;This method comprises the following steps:
To data acquisition systemCarry outSub-sampling, extracts the number of same client object every time, forms data acquisition system vector
Cluster number is set, and, it is rightIn each sampling set, k-means algorithms are performed, are obtainedGroupIt is individual Cluster centre
According to sampling setIn client's number in each cluster set,, calculate sampling setIn it is every The error sum of squares of individual cluster, calculation formula is as follows:
WhereinRefer to cluster set value theThe of individual clientIndividual property value,Refer to this cluster centreAttribute's Value;
Using error sum of squares criterion function, sampling set is calculatedError sum of squares, its calculation formula is as follows:
SelectionThat group cluster center corresponding to minimum value, exported as initial cluster center;
WithAs initial cluster center,To cluster number, to data acquisition systemUniverse perform k-means algorithms, obtainIndividual cluster
The middle distance for calculating each two cluster respectively(Euclidean distance between cluster centre), its Calculation formula is as follows:
Wherein,Refer toThe cluster centre attribute of individual clusterValue;ChooseTwo minimum Cluster mergings, lay equal stress on The new cluster centre calculated after merging, until cluster setMiddle clusters number is reduced toWhen, stop closing And, outputIndividual cluster set;Whole modified k-means algorithms terminate, and obtain according to customer value contribution degree index, logarithm According to setCluster.
CN201610497893.4A 2016-06-25 2016-06-25 A kind of consumer's clustering method towards wechat marketing Pending CN106096052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610497893.4A CN106096052A (en) 2016-06-25 2016-06-25 A kind of consumer's clustering method towards wechat marketing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610497893.4A CN106096052A (en) 2016-06-25 2016-06-25 A kind of consumer's clustering method towards wechat marketing

Publications (1)

Publication Number Publication Date
CN106096052A true CN106096052A (en) 2016-11-09

Family

ID=57215243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610497893.4A Pending CN106096052A (en) 2016-06-25 2016-06-25 A kind of consumer's clustering method towards wechat marketing

Country Status (1)

Country Link
CN (1) CN106096052A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038709A (en) * 2017-11-03 2018-05-15 平安科技(深圳)有限公司 Client's sampling pilot marketing method, electronic device and computer-readable recording medium
CN108573266A (en) * 2017-03-10 2018-09-25 中国移动通信集团河北有限公司 The method and apparatus for extracting common trait
WO2020113363A1 (en) * 2018-12-03 2020-06-11 Siemens Mobility GmbH Method and apparatus for classifying data
CN111488941A (en) * 2020-04-15 2020-08-04 烽火通信科技股份有限公司 Video user grouping method and device based on improved Kmeans algorithm
CN111527486A (en) * 2017-12-28 2020-08-11 东京毅力科创株式会社 Data processing device, data processing method, and program
CN113052505A (en) * 2021-04-30 2021-06-29 中国银行股份有限公司 Cross-border travel recommendation method, device and equipment based on artificial intelligence

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573266A (en) * 2017-03-10 2018-09-25 中国移动通信集团河北有限公司 The method and apparatus for extracting common trait
CN108038709A (en) * 2017-11-03 2018-05-15 平安科技(深圳)有限公司 Client's sampling pilot marketing method, electronic device and computer-readable recording medium
CN111527486A (en) * 2017-12-28 2020-08-11 东京毅力科创株式会社 Data processing device, data processing method, and program
WO2020113363A1 (en) * 2018-12-03 2020-06-11 Siemens Mobility GmbH Method and apparatus for classifying data
CN111488941A (en) * 2020-04-15 2020-08-04 烽火通信科技股份有限公司 Video user grouping method and device based on improved Kmeans algorithm
CN111488941B (en) * 2020-04-15 2022-05-13 烽火通信科技股份有限公司 Video user grouping method and device based on improved Kmeans algorithm
CN113052505A (en) * 2021-04-30 2021-06-29 中国银行股份有限公司 Cross-border travel recommendation method, device and equipment based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN106096052A (en) A kind of consumer's clustering method towards wechat marketing
Yin et al. Incomplete multi-view clustering via subspace learning
Yu et al. Self-paced learning for k-means clustering algorithm
CN108280491A (en) A kind of k means clustering methods towards difference secret protection
CN109657712A (en) A kind of electric business food and drink data analysing method based on the improved K-Means algorithm of Spark
CN111967520A (en) Improved SMOTE algorithm-based unbalanced data processing method
Dai et al. Multi-granularity relabeled under-sampling algorithm for imbalanced data
CN108549904A (en) Difference secret protection K-means clustering methods based on silhouette coefficient
CN107480685A (en) A kind of distributed power iteration clustering method and device based on GraphX
Thomas et al. Detecting symmetry in scalar fields using augmented extremum graphs
Niu et al. Stochastic rank aggregation
CN106127244A (en) A kind of parallelization K means improved method and system
US20200342204A1 (en) Instantaneous search and comparison method for large-scale distributed palm vein micro-feature data
Saxena et al. Re-GAN: Data-efficient GANs training via architectural reconfiguration
Shao et al. Labeling malicious communication samples based on semi-supervised deep neural network
Zhang et al. Scalegcn: Efficient and effective graph convolution via channel-wise scale transformation
Yu et al. DBWGIE-MR: A density-based clustering algorithm by using the weighted grid and information entropy based on MapReduce
Sun et al. Clustering with feature order preferences
CN109492770A (en) A kind of net with attributes embedding grammar based on the sequence of personalized relationship
Zhang et al. Projective label propagation by label embedding
CN108717551A (en) A kind of fuzzy hierarchy clustering method based on maximum membership degree
Zhang et al. Self-Adaptive-Means Based on a Covering Algorithm
Xia et al. On the substructure countability of graph neural networks
Naitzat et al. M-Boost: Profiling and refining deep neural networks with topological data analysis
CN113360732A (en) Big data multi-view graph clustering method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161109