CN106096052A

CN106096052A - A kind of consumer's clustering method towards wechat marketing

Info

Publication number: CN106096052A
Application number: CN201610497893.4A
Authority: CN
Inventors: 高扬华; 陆海良; 单宇翔; 郁钢
Original assignee: China Tobacco Zhejiang Industrial Co Ltd
Current assignee: China Tobacco Zhejiang Industrial Co Ltd
Priority date: 2016-06-25
Filing date: 2016-06-25
Publication date: 2016-11-09

Abstract

The present invention relates to social network data process field, particularly relate to a kind of consumer's clustering processing method towards wechat marketing, the method comprises the following steps: 1) to data complete or collected worksScan for, by data complete or collected works' randomizationSecondary；2) sample data set after each randomization is carried out k means algorithm cluster, it is thus achieved that one group of cluster centre,Sub-sampling, can obtain altogetherGroup cluster centre；3) utilize error sum of squares criterion function, search out one group of cluster centre of optimum, and export；4) Optimal cluster centers searched out with step 3) is as initial cluster center,For input parameter (), data complete or collected works are performed k means algorithm；5) in generationIn group cluster, nearest two groups of combined distance, recalculate the cluster centre after merging；Until clusters number is reduced to, stop merging；Whole algorithm terminates.The method increase speed and the stability of consumer data cluster process.

Description

A kind of consumer's clustering method marketed towards wechat

Technical field

At social network data process field, more particularly to a kind of consumer's cluster marketed towards wechat Reason method.

Background technology

K-means algorithms as one of the most frequently used Data Clustering Algorithm, its principle be it is pre-set need divide classification NumberAs input parameter, data set is divided intoIndividual cluster, according to each data object and each cluster cluster centre Euclidean distance judge which cluster this data object should be integrated into.In same cluster data object have each other compared with High similarity, and the data object similarity between different clusters is relatively low.K-means algorithms are comprised the concrete steps that：It is first depending on defeated Enter parameter, in data set random selectionIndividual data object respectively as each cluster cluster centre, calculate it is remaining each Euclidean distance of the data object to each cluster centre, to allCompare size, data object is referred to minimumInstitute is right In the cluster answered；Then the cluster centre of each cluster is recalculated, each data object is calculated again to each cluster centre Euclidean distance, foundationMinimum principle, reclassifies into correspondence cluster to data object, this process is repeated.Finally Untill the cluster centre of each cluster no longer changes or varied less, iterative calculation terminates, and exports lastIndividual cluster Cluster, the cluster of complete paired data collection.

Traditional k-means clustering algorithms procedure chart is as shown in Figure 1.

The shortcoming of traditional k-means clustering algorithms：

It is extremely sensitive to initial cluster center.Because k-means algorithms are random selection to the selection of initial cluster center, if Initial cluster center selection is improper, and algorithm is easily ensnared into locally optimal solution, rather than globally optimal solution.Particularly when data set point When cloth is uneven, marginal point, extreme point are possible to that initial point can be chosen as, and cause iterative convergence speed slow, Clustering Effect is not Substantially situations such as.

The content of the invention

In order to solve in the prior art, consumer data treatment effeciency is slow, and local optimum shape easily occurs in cluster process Condition is so that the problem of the leading to the failure present invention proposes a kind of consumer's clustering processing method marketed towards wechat, and this method is carried The high speed and stability of consumer data cluster process.

In order to solve the above technical problems, the present invention is achieved through the following technical solutions：

A kind of consumer's clustering processing method marketed towards wechat, the information of this method processing is the consumption gathered from wechat Person's information, including：Operation note information after the personal information that consumer actively makes a report on, consumer's concern wechat public platform, Buying behavior information and suggestion feedback information, this method comprise the following steps：

1）To data complete or collected worksScan for, by data complete or collected works' grab sampleIt is secondary；

2）K-means algorithm clusters are carried out to the sample data set after each grab sample, a group cluster center is obtained,It is secondary to take Sample, can be obtained altogetherGroup cluster center；

3）Using error sum of squares criterion function, an optimal group cluster center is searched out, and export；

4）With step 3）The Optimal cluster centers searched out are initial cluster center,For input parameter（）, it is complete to data Collection performs k-means algorithms；

5）In generationIn group cluster, nearest two groups of combined distance recalculate the cluster centre after merging；Until cluster Number is reduced to, stop merging；Whole algorithm terminates.

The present invention as a result of above-mentioned technical scheme, compared with prior art it is an advantage of the invention that：

（1）Improve the speed and stability of consumer data cluster process；

（2）Because in cluster process, data subset is multiple（More than 3）, Distributed Calculation that can be popular at present.

Brief description of the drawings

Fig. 1 is tradition k-means algorithm flow charts.

Fig. 2 is the k-means algorithm flow charts after present invention improvement.

Fig. 3 samples emulation data set D.

Fig. 4 is two kinds of algorithm performs result figures.

Embodiment

The present invention will be further described with reference to the accompanying drawings and examples：

Present invention design is adapted to the modified k-means algorithms of big data environment, and algorithm flow chart is as shown in Figure 2.At a certain section Between（One month）Customer data according to profit contribution value be attribute clustered.Had in each class clientIndividual data pair As, each data object hasThe individual property of value,Represent what customer data was concentrated TheThe of individual clientIndividual attribute.

The customer data set collected to some phase timeThe purpose clustered is obtained according to customer value tribute Degree of offering, is obtainedIndividual Customer clustering collection.

Algorithm basic step is described as follows：

To data acquisition systemCarry outSub-sampling, extracts the number of same client object every time, forms data acquisition system vector；

Cluster number is set, and, it is rightIn each sampling set, k-means algorithms are performed, are obtainedGroupIt is individual Cluster centre；

According to sampling setIn client's number in each cluster set,, calculate sampling setIn it is every The error sum of squares of individual cluster, calculation formula is as follows：

(1)

WhereinRefer to cluster set value theThe of individual clientIndividual property value,Refer to this cluster centreAttribute's Value.

Using error sum of squares criterion function, sampling set is calculatedError sum of squares, its calculation formula is such as Under：

(2)

SelectionThat group cluster center corresponding to minimum value, exported as initial cluster center.

WithAs initial cluster center,To cluster number, to data acquisition systemUniverse perform k-means calculate Method, is obtainedIndividual cluster。

The middle distance for calculating each two cluster respectively（Euclidean distance between cluster centre） , its calculation formula is as follows：

(3)

Wherein,Refer toThe cluster centre attribute of individual clusterValue.ChooseTwo minimum Cluster mergings, lay equal stress on The new cluster centre calculated after merging, until cluster setMiddle clusters number is reduced toWhen, stop closing And, outputIndividual cluster set.Whole modified k-means algorithms terminate, and obtain according to customer value contribution degree index, logarithm According to setCluster.

The present invention will carry out emulation experiment with computer software, after contrasting tradition k-means algorithms and improving herein K-means algorithms Clustering Effect.Realized in this simulation experimental program using Visual C++, computer hardware configuration For CPU:Inter i5 processors 2.5GHz;Internal memory：4GB.Related data sample parameter sets as shown in table 1：

Table 1

Data sample size	Data attribute is tieed up	Initial clustering number	Cluster numbers	Sample drawn number of times
					5000	2	12	4	20

The data decimation of this emulation experiment two dimensional sample emulation data set as shown in Figure 3, calculate respectively four in Fig. 4 The vector that refers both to of individual data subset is（0.6509,0.9582）、（3.4821,1.1241）、（3.9587,3.0213）、（1.7424,4.2508）.Tradition k-means algorithms are used first, to raw data setClustered, 30 tradition are performed altogether K-means algorithms.When performing tradition k-means algorithms every time, upset the order of input data.Similarly, to initial data CollectionIt is also to upset the order for reading in data when performing the k-means algorithms after 30 improvement.The purpose for the arrangement is that in order to examine The stability of checking method.

One group of representative cluster result, such as Fig. 4 are respectively selected in the implementing result to two kinds of algorithms（a）、（b）Institute Show, wherein dot red in figure is represented in each cluster, the position of cluster centre.Use the cluster of traditional k-means algorithms As a result similar Fig. 4 in（a）Occur in that altogether 23 times.Fig. 4（a）Situation about being reflected is the feelings for being more typically absorbed in Local Minimum Condition.And can stably obtain similar Fig. 4 after the modified k-means algorithms designed herein are performed（b）Shown cluster knot Really.

Followed by cluster result, the average value at 30 group cluster centers calculates the cluster centre produced with passing through Comparative analysis, illustrates the quality of two kinds of algorithms, as shown in table 2：

The cluster centre value of table 2 compares

Cluster numbering	Traditional k-means algorithms produce the average value of cluster centre	Modified k-means algorithms produce the average value of cluster centre	Calculate the cluster centre value produced
				1	{0.5964,1.1820}	{0.6256,0.9021}	{0.6509,0.9582}
2	{3.0425,0.8852}	{3.3584,1.3024}	{3.4821,1.1241}
				3	{3.3218,3.3204}	{4.1021,3.1541}	{3.9587,3.0213}
4	{1.6822,4.3652}	{1.7455,4.2057}	{1.7424,4.2508}

Pass through the comparison between cluster centre average value in table 2, it can be clearly seen that the cluster that modified k-means algorithms are drawn Central value is more nearly with the cluster centre value produced after calculating.With reference to Fig. 4（a）、（b）, we, which can do, further divides Analysis：Due to initial data skewness, the shape size of each cluster is also inconsistent, and traditional k-means algorithms random selection is just The way of beginning cluster centre, it is easy to choose number of edges strong point as initial cluster center.Once such case is run into, tradition The cluster centre that k-means algorithms are finally drawn very likely is absorbed in locally optimal solution, Fig. 4（a）Reflection is exactly this kind of phenomenon Typical case.Modified k-means algorithms, 20 equivalent mean samples are first carried out to initial data（Every group of sampling samples are included 250 elements）, using error sum of squares criterion function, preferably go out can most reflect data complete or collected works in this 20 groups of sampling samples Shape, the cluster centre of density feature substitute into computing in k-means algorithms as initial cluster center, and set initial clustering number Mesh, last agglomerative clustering result.So doing can make cluster result unrelated with reading in data order, be avoided that again poly- Class is isolated the situation generation to form local excellent solution.

The consumer's clustering method proposed by the present invention marketed towards wechat, stability is stronger, and the degree of accuracy is higher, especially fits Close the big big data source of processing skewness, data volume.

The specific embodiment of the present invention is the foregoing is only, but the technical characteristic of the present invention is not limited thereto, Ren Heben The technical staff in field is in the field of the invention, and the change or modification made all are covered among the scope of the claims of the present invention.

Claims

1. a kind of consumer's clustering processing method marketed towards wechat, it is characterised in that the information of this method processing is from micro- Believe the consumer information of collection, including：Behaviour after the personal information that consumer actively makes a report on, consumer's concern wechat public platform Note down information, buying behavior information and suggestion feedback information, this method comprises the following steps：

2. a kind of consumer's clustering processing method marketed towards wechat, it is characterised in that the information of this method processing is from micro- Believe the consumer information of collection, including：Behaviour after the personal information that consumer actively makes a report on, consumer's concern wechat public platform Note down information, buying behavior information and suggestion feedback information；Had in each class clientIndividual data object, each data object hasThe individual property of value,Represent customer data is concentrated theThe of individual clientIndividual attribute；This method comprises the following steps：

WhereinRefer to cluster set value theThe of individual clientIndividual property value,Refer to this cluster centreAttribute's Value；

Using error sum of squares criterion function, sampling set is calculatedError sum of squares, its calculation formula is as follows：

SelectionThat group cluster center corresponding to minimum value, exported as initial cluster center；

WithAs initial cluster center,To cluster number, to data acquisition systemUniverse perform k-means algorithms, obtainIndividual cluster；

The middle distance for calculating each two cluster respectively（Euclidean distance between cluster centre）, its Calculation formula is as follows：

Wherein,Refer toThe cluster centre attribute of individual clusterValue；ChooseTwo minimum Cluster mergings, lay equal stress on The new cluster centre calculated after merging, until cluster setMiddle clusters number is reduced toWhen, stop closing And, outputIndividual cluster set；Whole modified k-means algorithms terminate, and obtain according to customer value contribution degree index, logarithm According to setCluster.